Andrew Csinger
Department of Computer Science
University of British Columbia, Vancouver, Canada
csinger@cs.ubc.ca
In building effective multimedia interfaces, a chief limitation of traditional presentation design is the inability to meet individual user expectation at run-time. Recent technology offers new and unexplored possibilities for on-line design of individualized presentations that surpass the limits of the ``one size fits all'' approach forced onto books by the demands of the printing press. Rather than just adding horsepower to traditional techniques, I investigate user modelling strategies for intelligent multimedia interfaces.
I call my approach intent-based authoring to reflect its underlying emphasis on achieving the communicative goals of an author. This authorial intention is provided in the absence of the eventual viewer, and consists of a logical specification of the presentation, the actual form and content of which are determined later by characteristics of individual viewers.
Models of the user/viewer and of the author are needed to support this kind of run-time determination of form and content. I overview our minimal-AI approach to user modelling and to representing the intention of the author, and I introduce a working prototype we have built to demonstrate these ideas.
Representation
I use a variant of the Theorist framework for hypothetical
reasoning [3]. Given formulae (the facts), and set
of formulae
(the assumables), an explanation of a closed
formula
is a consistent set
that implies
, where
. Elements of
are called assumptions.
Some of the facts and the assumables
are supplied by knowledge
engineers prior to system use. Such knowledge includes domain-independent
and media-dependent presentation knowledge and expertise. One or more
authors provide representations of their communicative goals, or
intentions, and other specialists may contribute further knowledge. Some
of these roles are interchangeable: an author may contribute general
knowledge and thereby function as knowledge engineer, a viewer might form
a query expressing his own intention as an information seeker and thereby
function as an author, and so on. These knowledge bases, along with the
user's activity at an interface, comprise the inputs to the system.
This work extends the Theorist formalism to incorporate both recognition
and design into the same framework (see [2]). is
partitioned into the set
of assumables available for user recognition,
and the set
of assumables available for presentation design.
is
partitioned into disjoint sets, where every assumable
in
is
assigned a prior probability
; the disjoint sets correspond to
independent random variables (as in [4]). Every assumable
in
is assigned a nonnegative cost
.
Models and Designs
A model of the user is the set of recognition assumptions that explains observations about the user.
,
given independence of recognition partitions. Given model
, a design is a set of design assumptions
that (together with
the model
) explains the existence of a presentation that satisfies the
intention of the author; its cost is the sum of the costs of its
constituent assumptions (i.e.,
).
Note that the partitioning of partitions each explanation into a model
and a design. We define a preference relation over explanations such
that:
if
or
and
. So, the
``best'' explanation consists of the most plausible model of the user and
the lowest cost presentation.
Scrutability
These reasoning techniques are combined with an interaction paradigm I call scrutability, whereby users critique the model in pursuit of better presentations. My approach is to display to users a critical subset of the assumptions the system has made (determined by sensitivity analysis), and permit the user to change values using an intuitive graphical user interface (GUI). When the user explicitly sets the value of an assumption, this new information is considered highly reliable. Confidence is increased in the values of other assumptions displayed in the same GUI window as well, because the user was attending to that window and might have seen those values. A Bayesian analysis module implements these considerations. Evaluation is conditioned on user action and display contents to determine a posterior probability distribution for the assumables, so that further reasoning is based on the most accurate information available.
The user model and the design are the outputs of the system. These are further processed as suggested above into a display of a critical subset of the user model, and into a presentation of the design to the viewer.
Both recognition and design processes are performed at run-time, but are logically separated; this separation result in easier acquisition and debugging of knowledge.
Implementation
Our prototype implementation [1] demonstrates these ideas in the domain of video authoring. Although our approach to authoring is intended to apply across multiple media, we have begun to demonstrate these ideas with video because authoring in the video medium with traditional approaches inherits and exacerbates the problems from traditional media, and because the popularity of video as a recording medium continues to grow.
The current version consists of a reasoning agent and an interface agent that interact over a TCP/IP connection. A video server agent provides its services over a general TCP/IP connection, and is being designed to handle various tape formats, as well as video disk and digital video.
The Reasoner is a best-first Sicstus Prolog implementation of the assumption based reasoning framework already introduced. The Interface is implemented on a NeXT station with a NeXT dimension board, and the video server can currently access a video disk player and VCR.
There are two essential components of the Interface: the Control and the User Model windows.

The User Model window, shown in Figure , implements the
interactivity paradigm I have advocated. Clicking on any element in this
window instructs the Reasoner to make the requested change to the user
model and then to calculate a new presentation based on this updated model.

The Control window, shown in Figure , contains
-in addition to the familiar virtual VCR control panel at the lower
left- controls to advance to the next clip in the current edit-list, to
return to the previous clip in the current edit-list, to replay the
current clip, and to proceed with the presentation (Go). The Show button is a request that is passed on to the reasoning engine to
calculate the next best presentation. No! is merely a direct way
for the user to express dissatisfaction with the current presentation.
Any activity at the control window is echoed to the reasoner, which can
use plan-recognition techniques to infer the motives of the user.
I am testing the system with a body of video known as the UBC Computer Science Department Hyperbrochure, an hour long video disk that includes an introduction to UBC's computer science department by its head, interviews with most of the faculty and staff, as well as walk-throughs of the laboratories. Potential viewers of the material are prospective and current graduate and undergraduate students, faculty and staff, funding agencies and industrial collaborators. All these potential users bring idiosyncratic goals and interests that the system attempts to meet with tailored presentations.