Previous research measuring visual short-term memory (VSTM) suggests that the capacity for representing the layout of objects is fairly high. In four experiments, we further explored the capacity of VSTM for layout of objects, using the change detection method. In Experiment 1, participants retained most of the elements in displays of 4 to 8 elements. In Experiments 2 and 3, with up to 20 elements, participants retained many of them, reaching a capacity of 13.4 stimulus elements. In Experiment 4, participants retained much of a complex naturalistic scene. In most cases, increasing display size caused only modest reductions in performance, consistent with the idea of configural, variable-resolution grouping. The results indicate that participants can retain a substantial amount of scene layout information (objects and locations) in short-term memory. We propose that this is a case of remote visual understanding, where observers' ability to integrate information from a scene is paramount.
The relation between the mind and the world is of central interest in psychology, and scene perception is a multifaceted example of this relation. A critical characteristic of scene perception is the amount of information that can be extracted from a scene and retained in immediate memory. Currently, considerable debate centers on this quantity.
At one end of the debate is the hypothesis that very little is held in memory beyond an attended object (see, e.g., O'Regan, 1992; Wolfe, Klempen, & Dahlen, 2000). In this hypothesis, the intuitive experience of perceiving a rich world is explained by the richness of the information available from the world via eye movements (see O'Regan, 1992). The hypothesis of minimal representation receives support from difficulties of change detection documented in many studies (e.g., Grimes, 1996; Simons, 1996).
At the other end of the debate is the hypothesis that rich representations of scenes result from combined contributions of short-term and long-term memory (e.g., Hollingworth, 2004, 2005, 2007; Irwin & Zelinsky, 2002; Melcher, 2006). Long-term memory contributions can be obviated by experimental design, limiting the represented information to short-term memory. Visual short-term memory (VSTM) for objects has been studied carefully, and many researchers have concluded that it has a sharp limit, between two and four objects (e.g., Hollingworth, 2006; Vogel, Woodman, & Luck, 2001). However, almost all of the studies supporting a sharp limit in VSTM capacity have measured memory for properties of objects. Object properties constitute only one aspect of scene memory. A second fundamental aspect is memory for the layout of objects and other structures in a scene. The representation of spatial information is a fundamental component of short-term memory models (e.g., Baddeley & Hitch, 1974; Logie, 1995).
A number of studies suggest that fairly complex layouts of objects can be retained over short-term intervals. In seminal studies of very short-term visual memory, Phillips (1974) also included longer short-term intervals. At the short-term intervals, he found accurate detection of changes of the position of a single element within 25-element arrays (e.g., above 85% in several conditions). Simons (1996) contrasted VSTM for object properties and locations and found location memory, but not object property memory, to be near the ceiling (see also Aginksy & Tarr, 2000; Alvarez & Oliva, 2007; Jiang, Olson, & Chun, 2000). Franconeri, Alvarez, and Enns (2007) recently concluded that as many as seven locations can be held in short-term memory. Rensink (2000b) obtained capacity estimates of at least nine for contrast signs of objects. Brockmole, Wang, and Irwin (2002) presented dot layouts for an integration task and estimated that the number of dots held in VSTM was about 10. A separate line of evidence for complex representations of layout is spatial priming with scenes; Sanocki and colleagues (Sanocki, 2003; Sanocki, Michelet, Sellers, & Reynolds, 2006; Sanocki & Sulman, 2009) found evidence that broadscale representations of layout information are activated in memory by a prime scene. …