Academic journal article Memory & Cognition

Extrapolating Spatial Layout in Scene Representations

Academic journal article Memory & Cognition

Extrapolating Spatial Layout in Scene Representations

Article excerpt

Can the visual system extrapolate spatial layout of a scene to new viewpoints after a single view? In the present study, we examined this question by investigating the priming of spatial layout across depth rotations of the same scene (Sanocki & Epstein, 1997). Participants had to indicate which of two dots superimposed on objects in the target scene appeared closer to them in space. There was as much priming from a prime with a viewpoint that was 10° different from the test image as from a prime that was identical to the target; however, there was no reliable priming from larger differences in viewpoint. These results suggest that a scene's spatial layout can be extrapolated, but only to a limited extent.

It is clear that there is considerable information in a 2-D image of a scene about the 3-D spatial arrangement of the scene being depicted and, specifically, about the distances of objects in the scene from the location of the portrayer of the scene. A large number of studies have employed simple visual stimuli to explore which types of visual cues that convey depth information are automatically activated when a 2-D image of a visual stimulus is presented. For example, simplified stimuli have been used to demonstrate the basic monocular and binocular cues that lead to interpretations of their size, depth, and distance (for a review, see Palmer, 1999); however, few studies have examined the nature of spatial processing in real-world scenes.

Spatial information processing is especially relevant to the understanding of scene recognition. Both neural-based accounts and computational models use spatial layout as a central component of scene processing. Early studies of the neural processing of scenes by Epstein and colleagues (e.g., Epstein & Kanwisher, 1998) identified the parahippocampal place area (PPA; a region of the posterior medial temporal lobe) as a region that preferentially responds to images of scenes (vs. faces or objects) and, more specifically, to their spatial layout, as shown by a larger response to scenes than to individual buildings (e.g., a house). More recently, studies have found that the processing that occurs in the PPA may be closely linked to the structural geometry of a particular view. For instance, the PPA is sensitive to changes in viewpoint of a scene (Epstein, Graham, & Downing, 2003; Epstein, Higgins, & Thompson- Schill, 2005). Epstein et al. (2005) had participants judge whether two sequentially presented images were of the same place. The images could represent two different places, the same place from different viewpoints, or the same place from the same viewpoint. The authors found a viewpoint-specific effect, in which a viewpoint change led to less adaptation than when there was no viewpoint change. Henderson, Larson, and Zhu (2008) also found that the structural geometry of a scene influenced the PPA. They compared the activation produced by close-up scenes (e.g., view of a desk in an office) and full scenes (e.g., the entire office). They found that the PPA's level of activation was stronger for the full scenes than for the close-up scenes, which suggests that the PPA is sensitive to the real-world size of the depicted image, even when the actual size of the image is held constant.

The spatial layout also has an important role in recent computational models of complex scene recognition. For example, the strong link between the semantics of a scene and its spatial qualities is important in the spatial envelope theory (Greene & Oliva, 2009; Oliva & Torralba, 2001, 2002; Ross & Oliva, 2010). According to this theory, a scene's semantic category can be derived from a small set of perceptual dimensions (e.g., naturalness, openness, roughness, expansion, ruggedness) representing the spatial structure of the scene. Once values for these dimensions are calculated, the model generates a multidimensional space. From the scenes projected together within this space, semantic categories emerge (e. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.