We propose a theory of depiction and interpretation that formalizes image domain knowledge, scene domain knowledge and the depiction mapping between the image and scene domains. This theory is illustrated by specifying some general knowledge about maps, geographic objects and their depiction relationships in first order logic with equality.
An interpretation of an image is defined to be a logical model of the general knowledge and a description of that image. For the simple map world we show how the task level specification may be refined to a provably correct implementation by invoking model preserving transformations on the logical representation. In addition, we sketch logical treatments for querying an image, incorporating contingent scene knowledge into the interpretation process, occlusion, ambiguous image descriptions, and composition.
This approach provides a formal framework for analyzing existing systems such as Mapsee, and for understanding the use of constraint satisfaction techniques. It also can be used to design and implement vision and graphics systems that are correct with respect to the task and algorithm levels.