We propose a logical framework for depiction and interpretation that formalizes image domain knowledge, scene domain knowledge and the depiction mapping between the image and scene domains. This framework requires three sets of axioms: image axioms, scene axioms and depiction axioms. An interpretation of an image is defined to be a logical model of these axioms.
The approach is illustrated by a case study, a reconstruction in first order logic of a simplified map understanding program, Mapsee. The reconstruction starts with a description of the map and a specification of general knowledge of maps, geographic objects and their depiction relationships. For the simple map world we show how the task level specification may be refined to a provably correct implementation by applying model-preserving transformations to the initial logical representation to produce a set of propositional formulas. The implementation may use known constraint satisfaction techniques to find the set of models of these propositional formulas. In addition, we sketch preliminary logical treatments for image queries, contingent scene knowledge, ambiguity in image description, occlusion, complex objects, preferred interpretations and image synthesis.
This approach provides a formal framework for analyzing and going beyond existing systems such as Mapsee, and for understanding the use of constraint satisfaction techniques. It can be used as a foundation for the specification, design and implementation of vision and graphics systems that are correct with respect to the task and algorithm levels.