lowing a bi-directional flow of information and processing across the levels. The machine will employ MIMD processing at the high level and Multi-SIMD (i.e., parallel local SIMD) under MIMD control at both the intermediate and low levels.
This chapter discusses a range of problems in computer vision and describes the VISIONS system,1 an approach to the construction of a general vision system that relies on knowledge-based techniques for image interpretation. The goal of this effort is the construction of a system capable of interpreting natural images of significant complexity. Over the past 12 years, the VISIONS group at the University of Massachusetts has been evolving the system while applying it to natural scenes, such as house and road scenes, aerial images, and biomedical images. Our philosophy is that it is reasonable to expect that each new domain will require a different knowledge base, but most of the system should remain the same across task domains. This paper documents the status of the system in mid-1986 as its development continues. Note that we do not attempt to carefully survey the literature on knowledge-based vision; partial reviews may be found in [16, 17, 29, 31, 54] and some representative individual research efforts are described in [13, 15, 18, 19, 32, 42, 48, 52, 63, 70, 82, 88, 89, 92, 93, 99, 100, 104, 107, 110, 119, 122, 123, 128, 133, 134].
Our research has concentrated to a large extent on the identification of objects in static color images of natural scenes by associating two-dimensional image events with object descriptions [19, 20, 21, 53, 54, 55, 56, 57, 58, 59, 60, 61, 72, 106, 109, 111, 113, 114, 115, 116, 117, 118, 148, 149, 150]. However, at a more general level, the VISIONS design utilizes many stages of processing in the transformation from "signals" to "symbols," or to use more specific terminology, from two-dimensional image events to object labels and three-dimensional hypotheses. Unless the domain is extremely simple and heavily constrained so that object matching processes can be applied directly to the image (e.g., via template matching), there must be some form of sensory processing that extracts information from an image to produce an intermediate representation.____________________