[Next] (up) (previous)
Next: About this document

Dynamic Hypervideo: Knowledge-based Annotation and Presentation of Video Documents

Andrew Csinger (csinger@cs.ubc.ca)
Kellogg S. Booth (ksbooth@cs.ubc.ca)
Steve Gribble (gribble@cs.ubc.ca)
David Poole (poole@cs.ubc.ca)
Susan E. Rathie (rathie@cs.ubc.ca)

Department of Computer Science
University of British Columbia
Vancouver, Canada V6T 1Z2

Keywords: authoring, user modelling, multimedia, video, artificial intelligence.

Authoring is the process of preparing information for presentation to users in the form of a generalized document, and authoring systems run the gamut from word-processors to video editing suites. A number of systems have been designed to support the authoring of hyperdocuments, which are documents that can be navigated by a reader who follows conceptual links provided by the author.

Traditional models of authoring enforce undesirable compile-time commitments upon both the form and the contents of documents. The information and presentation spaces of the authoring task are not clearly separated, making it impossible to automatically generate user-tailored presentations of a document to individual readers or viewers. The familiar book format illustrates these points; once printed, there is no way to change the presentation for the particular needs and desires of individual readers, or groups of readers. The author has both selected and ordered all the information to be presented.

Note that hypertext authoring, although different in some respects, still falls within the traditional model just described. The authors of hyperdocuments are forced to supply (at compile-time) all the links of the hypergraph representing the information space. This space can be very large, and a new problem must be addressed: readers can get lost in hyperspace if simply given the freedom to navigate at will.

Structured document approaches separate the specification of form from the specification of content, permitting aspects of the form of the presentation to be decided at run-time. Such systems, however, still enforce commitment to content, placing the design burden almost entirely upon the human agents in the authoring cycle; either the author must supply all the content, or the reader needs to have significant resources to infer missing content. The author either overwhelms his reader with unnecessary detail, or runs the risk of being misunderstood. In any case, current systems do not support the generation of user-tailored presentations. This limitation applies to hyper-authoring approaches.

Before user-tailored automatic presentation is possible, the specification and presentation phases must be more thoroughly decoupled. Not only does form need to be separated from content, but content must be separated from intent, the original communicative goals of the author.

Models of the users of authoring systems are needed to overcome these limitations. An author model consists of at least an intention, which is a (potentially complex) communicative goal, the multimedium analogue to the speech-act. A reader model consists of a representation of his goals and desires in consulting the system. Thus, while the intent of the author may have been to convince the reader of a dynamic multimedia document of a particular argument, the reader may be time-constrained as well as generally uninterested in the subject matter; the authoring/presentation system must strike a balance and generate a presentation that fulfills the author's original intention within the constraints imposed at run-time. Note that the author and reader may be the same individual.

A manager, for instance, may need to consult the minutes of a meeting she might have missed, or which she can't remember. She may wish to know everything Smith said about the competition, but may be otherwise uninterested in the rest of the four-hour meeting.

The problems with traditional authoring approaches are particularly severe in the video domain, where the intrinsic, unstructured linearity of the medium calls for a new approach that supports both the annotation and presentation of video records.

In the annotation of video, we identify the Syntactic Ambiguity problem as the inconsistent use of nomenclatures during logging and annotation: for instance, the Group Support System (GSS) in use at a meeting may have used Smith's login id, dsmith, while the human taker of the minutes might have used his surname. Different human annotators might also use different terms for similar events in the record. Automatic retrieval methods need to deal with syntactic difficulties like these before queries can be usefully addressed. The Semantic Unpredictability problem is that an annotator does not know a priori the intervals in the video record that will be required for subsequent presentation: if the record is not indexed with references to the company's competition, it will be difficult or impossible to construct queries to retrieve such references. Both of these annotation problems can be mitigated with a knowledge-based approach.

As concerns the presentation of video information, the need to generate user-tailored presentations is particularly pressing; once again, a knowledge-based approach permits automatic run-time edit-list preparation based upon explicit, consultable user models. The manager interested in experiencing the relevant portions of the video record of a meeting, for instance, would in effect become the author and viewer of a custom video presentation designed by the system in response to her query, in conformity with the parameters of the user model.

Using a minimal AI approach, we are implementing a video authoring prototype that partially overcomes these limitations, with consequent savings in human effort in both the annotation and presentation phases of video authoring. The emphasis in this work is on the construction of an adequate model of the user; this model is used to tailor the presentation to meet the user's goals and expectations. From a hyper-authoring perspective, the object is to fashion the right links in the dynamic hypergraph that ties related parts of the multimedium together. The user model is acquired at run-time by reasoning about and interpreting the actions of the user at an interface. Since the model is dynamic, and the presentation depends upon the model, the presentation is also dynamic. The system is easily integrated into GSS environments that can supply additional automatic video indexing and logging support.

This work is related to research on interactive cinema, video-on-demand service, multimedia document preparation and presentation, and artificial intelligence techniques for user modelling. Further information is available from the authors about these relationships.



[Next] (up) (previous)
Next: About this document


csinger@cs.ubc.ca
Tue Nov 29 11:59:39 PST 1994