ROBUST COMPUTATION OF INTRINSIC IMAGES FROM MULTIPLE CUES
JOHN (YIANNIS) ALOIMONOS University of Maryland
CHRISTOPHER M. BROWN University of Rochester
The central problem of computer vision may be stated as follows:
From one or a sequence of images of a moving or stationary object or a scene, taken by a monocular (one eyed) or polynocular (many eyed) moving or stationary observer, to understand the object or the scene and its three-dimensional properties.
All the terms in the above definition are well defined, with the exception of the term understand. What is really the meaning of understand with respect to this problem? There have basically been two approaches to this question in computer vision: reconstruction and recognition (Figure 2.1). The reconstruction school attempts to reconstruct the physical parameters of the visual world, such as the depth or orientation of surfaces, the boundaries of objects, the direction of light sources and the like. The recognition school aims for the recognition or description of objects, and studies processes whose end product is some piece of behavior like a decision or a motion. Both schools have strong ties with psychology and neuroscience, and it seems likely that both schools will merge into a new one that may find an answer to the vision problem.
Physical parameters derived from vision can basically be classified in two categories: retinotopic and nonretinotopic. Nonretinotopic ones can be divided into global features (such as ego motion or light source direction) and objects and relations ( Ballard, 1984). Retinotopic parameters are spatially indexed at every image point. Retinotopic parameters (shape,