Academic journal article Perception and Psychophysics

A Stereo Advantage in Generalizing over Changes in Viewpoint on Object Recognition Tasks

Academic journal article Perception and Psychophysics

A Stereo Advantage in Generalizing over Changes in Viewpoint on Object Recognition Tasks

Article excerpt

In four experiments, we examined whether generalization to unfamiliar views was better under stereo viewing or under nonstereo viewing across different tasks and stimuli. In the first three experiments, we used a sequential matching task in which observers matched the identities of shaded tube-like objects. Across Experiments 1-3, we manipulated the presentation method of the nonstereo stimuli (having observers wear an eye patch vs. showing observers the same screen image) and the magnitude of the viewpoint change (30° vs. 38°). In Experiment 4, observers identified "easy" and "hard" rotating wire-frame objects at the individual level under stereo and nonstereo viewing conditions. We found a stereo advantage for generalizing to unfamiliar views in all the experiments. However, in these experiments, performance remained view dependent even under stereo viewing. These results strongly argue against strictly 2-D image-based models of object recognition, at least for the stimuli and recognition tasks used, and suggest that observers used representations that contained view-specific local depth information.

We easily recognize many familiar and unfamiliar objects that vary in shape, color, texture, movements, and so on. Although any or all of these properties can be used for recognition, it is largely assumed that recognition is based predominantly on matching shapes that are recovered from the visual input to shapes that are encoded in short- and long-term visual memory. This assumption has several motivations. First, shape can be derived from different sources of visual information, such as motion or stereo information (Bulthoff, 1991; Marr, 1982). second, because of multiple inputs to the shape representation, shape is robust to changes to or degradation of the visual input. Finally, in most circumstances, shape can be used to reliably identify objects (see, e.g., Hayward, 1998).

Despite the importance of shape for object recognition, how 3-D shape is represented for recognition remains elusive (Bulthoff, Edelman, & Tarr, 1995). In this regard, one outstanding issue is the extent to which the object representation encodes object-centered 3-D depth and structure (see, e.g., Marr & Nishihara, 1978) as opposed to viewer-centered 2-D views (e.g., Poggio & Edelman, 1990). Another issue is the possibility that the object representation encodes some intermediate shape representation, such as view-invariant qualitative parts (e.g., Biederman, 1987) or view-specific local depth of visible surface patches, such as Marr's (1982) 2.5-D sketch (see also Edelman & BuIthoff, 1992; Williams &Tarr, 1999).

Building on previous work (Edelman & Bulthoff, 1992; Farah, Rochlin, & Klein, 1994; Humphrey & Khan, 1992), in the present study we examined the role of stereo information in object recognition, since this is a strong source of information about 3-D depth and structure, alone or in combination with other depth cues (see, e.g., Bulthoff, 1991; Bulthoff & Mallot, 1988; Landy, Maloney, Johnston, & Young, 1995). Specifically, we examined whether the addition of stereo information facilitates the recognition of objects when they are presented at an unfamiliar viewpoint or at a familiar viewpoint. We did not test novel objects with distinctive part structure, which is often found in real world objects (Biederman, 1987). Rather, we varied the recognition task and stimuli in other important ways over four experiments in an effort to explore at least some of the conditions under which the visual system may encode depth and 3-D structure information. Our secondary aim was to compare our results with those of previous studies in which similar novel objects with no distinctive part structure were used.

Edelman and Bulthoff (1992) initially found that subjects were more accurate at recognizing novel objects under stereo than under nonstereo viewing. Their stimuli were computer-generated wire forms constructed by joining thin straight tubes together end to end. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.