FORM PERCEPTION USING TRANSFORMATION NETWORKS: POLYHEDRA
DANA H. BALLARD
The University of Rochester
In order to navigate and manipulate objects in the environment, one must have a model of oneself and the surroundings. The key issues are: What form should these models take? How are they constructed from visual input? and How are they used? We argue that the use of object- centered coordinate frames to find transformations is an essential ingredient in such models. Furthermore, such transformations can be economically computed in terms of a hierarchical network where levels in the network represent discrete functional values of geometrical constraints.
For a long time research in the computational problem of perception has struggled with the basic questions of extracting explicit descriptions from mostly visual input. However, as these problems become better understood and formative solutions are available, the more basic questions of the goals of vision and motor actions start demanding attention. One useful function of vision is to allow us to interact with our geometric environment. There are several ways that this can happen. In navigation, the goal is to keep track of our relation to a geometric reference. Motion perception may be viewed as a kind of navigation described by the temporal change of the relation to a reference object. In segmentation, the goal is to isolate the parts of the visual field that belong to a reference object. Object identification is a memory task in which an isolated visual subfield must be correctly matched to a stored object representation.