Next: Acknowledgments Up: Three-Dimensional Object Recognition Previous: Related research on

Conclusions

One goal of this paper has been to describe the implementation of a particular computer vision system. However, a more important objective for the long-term development of this line of research has been to present a general framework for attacking the problem of visual recognition. This framework does not rely upon any attempt to derive depth measurements bottom-up from the image, although this information could be used if it were available. Instead, the bottom-up description of an image is aimed at producing viewpoint-invariant groupings of image features that can be judged unlikely to be accidental in origin even in the absence of specific information regarding which objects may be present. These groupings are not used for final identification of objects, but rather serve as ``trigger features'' to reduce the amount of search that would otherwise be required. Actual identification is based upon the full use of the viewpoint consistency constraint, and maps the object-level data right back to the image level without any need for the intervening grouping constructs. This interplay between viewpoint-invariant analysis for bottom-up processing and viewpoint-dependent analysis for top-down processing provides the best of both worlds in terms of generality and accurate identification. Many other computer vision systems have experienced difficulties because they attempt to use viewpoint-specific features early in the recognition process or because they attempt to identify an object simply on the basis of viewpoint-invariant characteristics. The many quantitative constraints generated by the viewpoint consistency analysis allow for robust performance even in the presence of only partial image data, which is one of the most basic hallmarks of human vision.

There has been a tendency in computer vision to concentrate on the low-level aspects of vision because it is presumed that good data at this level is prerequisite to reasonable performance at the higher levels. However, without any widely accepted framework for the higher levels, the development of the low level components is proceeding in a vacuum without an explicit measure for what would constitute success. This situation encourages the idea that the purpose of low-level vision should be to recover explict physical properties of the scene, since this goal can at least be judged in its own terms. But recognition does not depend on physical properties so much as on stable visual properties. This is necessary so that recognition can occur even in the absence of the extensive information that would be required for the bottom-up physical reconstruction of the scene. If a widely accepted framework could be developed for high-level visual recognition, then it would provide a whole new set of criteria for evaluating work at the lower levels. We have suggested examples of such criteria in terms of viewpoint invariance and the ability to distinguish significant features from accidental instances. If such a framework were adopted, then rapid advances could be made in recognition capabilities by independent research efforts to incorporate many new forms of visual information.

Next: Acknowledgments Up: Three-Dimensional Object Recognition Previous: Related research on

David Lowe
Fri Feb 6 14:13:00 PST 1998