One criticism of systems which attempt to model their users is their
inscrutability to these users. If the system acts in an infelicitous
manner which does not meet the needs of the user, and where this action
results from an error in the user model, a desirable property of the system
is scrutability.
The user should be able to reason to himself as in, for example: ``The
system is giving me information I don't need. Hmmm...Maybe it thinks
I'm a student, and not a faculty member.'' The user should be able to
``debug'' the user model in this way. Supporting this kind of interaction
imposes interesting constraints on the reasoning techniques employed by
the system.
Users should also be given the means by which to express their dissatisfaction with a presentation. This expression can be interpreted as either a request for the next best presentation, or as a request that the system select the next most plausible model of the user.
Our interaction paradigm supports both the scrutability and the dissention conditions described above.
In calculating a model that represents the user, the system makes observations upon which further reasoning is conditioned. These observations can be of a number of different types from different sources, but the most important for our purposes here are the observations of the user's interaction at the user interface (described later). Keystrokes and mouse actions can be interpreted using plan-recognition techniques to build elements of the user model, although we have not yet implemented this. Instead, we provide the user with direct access (with a graphical user interface) to salient elements of the user model, as well as with the ability to criticize these elements. It is this interaction which is used to build the user model.
A prior model is first shown to the user.
This is the user
model constructed initially by the system using only the prior
probabilities associated with recognition assumptions, and before the user
has seen any presentations designed from the model. The user provides
feedback to the system about the validity of some or all of the prior
assumptions, and the system uses this new information (observations) to
calculate a new prior model of the user, which is then re-displayed. This
cycle of prior scrutiny may repeat, until the user finally instructs
the system to design a presentation.
After having seen a presentation, and perhaps having been dissatisfied with it, the user can continue to scrutinize the model, now known as a posterior model, in a cycle of activity we refer to as posterior scrutiny. Changes made by the user to the model should result in changes -hopefully in improvements- to the presentation.
Recognition assumptions are made in the course of these abductive processes, and the system always works on the most likely model until it is proven or until another model in the queue becomes more likely [16].
A difference between prior and posterior scrutiny with model update driven
by the user is that prior update from to
may not change the
best design, or may result in a design which maps to the same
presentation; in posterior update, however, the transition from
to
must be accompanied by a different presentation, if the
user is to continue to feel that the system is paying attention to the
interaction.
For instance, if the system displays to the user the assumption that the user is a faculty member, and the user does not critique the assumption, the system may upgrade the status of the assumption if the system believes that the assumption has been seen and understood by the user. This is reasonable if: the window in which the assumption is displayed is not obscured, the text in the window is clearly rendered and is large enough, the user is not distracted by other events on the desktop, the user is not distracted by other events in the environment (babies crying, cars colliding, etc.), and so on.
The case where a user critiques assumption but does not critique
assumption
is of particular interest. Assumption
can be upgraded
because of the user's direct action, but confidence in assumption
can
be upgraded based on the user's inaction, on the grounds that the user
was actually attending to the appropriate window (because the user
did critique assumption
).
Various display strategies can be used to draw the user's attention to one or more of the displayed assumptions. A particular assumption can be highlighted (with color, font, or animation, for instance) to increase the likelihood that the user is attending to the relevant portion of the desktop, and to the relevant part of the window. Such perceptually salient display techniques can increase the reliability of the system's assumptions about the user, but require that the system model such things as the media capabilities of the user's display and interaction devices. The system would also benefit from knowing whether the window in question is partially or totally obscured by other desktop objects, which would require some degree of integration between the system and the operating system or window manager. We have not conducted a deep analysis of such desiderata for future operating systems, but bring them up here because they are compatible with our interaction paradigm, and can be represented in our reasoning framework. Nor have we made empirical investigations to decide which are the most effective display techniques, but we make what we think are reasonable decisions until these results become available.

Simple Bayesian analysis can be performed to calculate posterior
probabilities for the recognition assumables that are used. The idea that
we seek to capture here is that the user's actions at the interface are
conditioned on the state of the world, and in particular upon the user's
model of himself and the contents of the user model display window. For
simplicity, a separate Bayesian network can be derived from the schema
shown in Figure -a and used to represent these
relationships for each random variable of interest.
Some of the preceding considerations are represented in the Bayesian
network illustrated in Figure -b, where User-Type,
User-Action and Display are random variables with a
conditional probability distribution over the following states:
and
Equation 1 describes the expression corresponding to
the network in Figure -a. This expression can be
evaluated to determine the posterior (after an action by the user)
probability of any random variable (assumable) by replacing
and
with the variable and the value of interest, respectively (as per
Figure refFigfig:Bayes-b). In addition to the prior distribution of
, the conditional probabilities are needed for
.
The number of entries in this contingency table is , and they can be determined
empirically; we use what we think are reasonable values. Some of the more
important ones are:

For instance, the posterior probability that the user is of type
is given by Equation 2, where
is one of
.
Another way to look at the issue of perceptual salience is to note that the system is engaged in a dialog with the user, and that the presentation of the user model window is a communicative act by the system. The perlocutionary effect [22] of these acts is knowledge on the part of the user of the user model. The intuition is that the likelihood of achieving this communicative goal is enhanced by using appropriate presentation techniques to highlight the most important parts of the message, and that the use of such techniques licenses an increased commitment by the system to the beliefs which follow from successful communication.
A problem with displaying the model to the user in support of scrutability is that the model may be huge, if not infinite. Certainly, there may be too many recognition assumptions to be effectively (cf. perceptual salience) displayed on a screen, and far too many for quick assimilation by users. This cognitive task is one characterized by focused, serially directed visual attention. Research in psychology has shown that the time taken for such tasks is proportional to the number of objects in the visual field [23].
A solution is to display a salient subset of
, to which we refer as
the salient model, consisting of those assumptions which have the
greatest effect on the design (and therefore upon the presentation as
well). Sensitivity analysis is performed to select the critical
recognition assumptions. The cardinality of
can be varied to maintain
an appropriate pace of interaction; if the user is taking too long to
evaluate the alternatives, the number of alternatives in the next
iteration can be reduced, and vice versa. In addition, if the
alternatives are ranked, [graphical] display techniques can be used to
render this ordering to the viewer.
=cmssq8 =cmr7 =cmr10 =cmssq8 at 5.335pt =cmss12 =cmss9 =cmss8
In the following, let represent the currently most plausible model,
and
the best design that is consistent with
. We require a total
ordering of the assumptions
by which to rank these assumptions
for display to the user. More precisely, we seek a function
;
can be sorted to order the
. The examples
below refer to the following disjoint sets of recognition assumptions:
Some useful notation follows.
We want to interpret as the sensitivity of
to the assumption
. There are many ways to formulate and eventually to program
this ranking function.
One approach is to rank the assumptions of the current model according to
the expected change in the cost of the best design if is replaced by
its peer models, as in the opaque reading of Equation 3, wherein
and
.
It might be argued that, although this is a reasonable thing for the reasoning engine to calculate, the cost of the best design for one model is incommensurable with the cost of the best design for another model (remember to apply the opaque reading here).
The same approach can be taken transparently, albeit with a loss in
directness insofar as the reasoning process is concerned. Now we're
talking about evaluating Equation 3 in its transparent reading,
wherein and
.
In the opaque evaluation, for each of the
. In the transparent evaluation,
, a constant, i.e., the
best design for the most likely model
. The problem with the
transparent approach to evaluation is that we do not, in our minimal AI
reasoning framework, represent all the utilities required to evaluate this
expression. This theme emerges at various points throughout this article:
we don't want to list all these numbers, and it is not a failing of
the approach that the quantities in question are not available.
Nevertheless, in general,
may be undefined, which is to say
that it may be that
.
We may pursue a reasonable surrogate for the missing . One
way is to calculate
for each
according to the opaque
interpretation of Equation 3, and then evaluate for each peer a
metric
based on how different the [expected cost of the] best
design
is from the current best design
. Such a metric can be
as simple as the count of matching design assumptions, or may involve
something like Equation 4.
Equation 4 reads: the cost of given
is the cost of
the irrelevant part of the presentation (represented by the relevance
assumptions in
which are not in
) and the opportunity cost of
not having made the right presentation (represented by the cost of the
relevance assumptions that are in
but not in
) less the cost of
the intersection of
with opaque
(the best design for model
). In other words, the two positive terms reflect penalties for
making the wrong presentation; the first reflects a penalty for the
irrelevant components of the presentation, the second reflects a penalty
for the omission of relevant components. Replacing
in
Equation 3 with the expression in Equation 4 results
in what we consider plausible values for all the required quantities
(further justification for this approach appears elsewhere in this
article).
Still other approaches might involve comparisons of the derivation trees of designs for alternate models.
We are still exploring different ranking functions. Once calculated, the rest of the procedure is the same.
Sort for all
and let the salient model
, referred to above, consist of the first
assumptions in
this sorted list. These are the assumptions to which the design, given
the current user model, is the most sensitive.
In our interaction paradigm, we show the user the assumptions in , in
the context of the disjoint sets to which they belong; i.e., where
, we display
along
with all the names of the assumables in the disjoint set to which
belongs. For example, if
and some
, the user sees a set of
names faculty, staff, student with the actual
assumption highlighted.
We set , the number of hypotheses displayed, to some small integer like
5 or 7. In sample implementations so far, we use simple text attached to
radio buttons, so that users see not only the assumption that was made,
but the other members of the disjoint sets to which the assumption
belongs; a simple click on a radio button instructs the system to
recalculate the user model with the new assumption. (See
Section 6 for more on implementation approaches).
Joint sensitivity analysis is not performed because: 1) single faults are dramatically more likely than multiple faults, 2) accounting for multiple faults is exponential, and 3) because the effects of multiple faults would be very difficult for users to track, resulting in a huge cognitive burden that compromises our scrutability requirement.
The system also makes design assumptions, always choosing the lowest cost
alternative that is consistent with the model and with the design so far,
which results in a design as follows, where is the most likely model:
or, equivalently,
Our approach is to incrementally work in best-first order on the lowest cost design. Other approaches are, of course, possible, and one that has been suggested is to design based upon a notion of expected value, or expected cost [2].
Decision-theory has been applied by others to design tasks under uncertainty. Some of this literature (see Cheeseman [2] for a discussion) argues that the best design is the one that results from averaging over all possible [peer] models (probabilistically weighted), i.e., that the expected cost function is to be minimized to find the best presentation design:
where is the cost of design
in the context of
model
. There are reasons in our application why this may be the wrong
thing to do.
First of all, one of the goals of this research is to avoid the formidable knowledge engineering task of developing and justifying a complete contingency table of joint probabilities, and a complete table of utilities. Both of these are required to evaluate Equation 5 in its full generality.
In addition, using expected cost to determine the design would place a
remarkable burden upon the user, who would in effect be asked to model all
possible users of the system; in order to be able to ``understand''
infelicitous presentations and correct the system's misconceptions, the
user would have to know about the utilities/costs of elements of the
presentation for other, hypothetical users. The user, for instance, upon
presentation of material about a basketball game which he finds completely
irrelevant, would have to judge the value of a presentation of sports and
leisure material to all of faculty, student and staff. In particular, the
following can occur. Let be the disjoint set
and assume
the following costs:
. The user is a faculty
member, the system has assumed that the user is a faculty member, and this
assumption is clearly indicated to the user. Then,
, and
. There is no indication to the user why he is watching
an irrelevant basketball game, because this behavior is due to the high
aversion of students to research-a value not shown to the user. The
behavior of the system is inscrutable.
The two approaches often select the same presentation, since is
weighted towards the most likely model (the single largest probability
term in both approaches is the one that applies to the most likely model),
but this is not the issue. What is of concern to us here is the behavior
of the system and its impact on user interaction when the presentation is
inappropriate because of an error in the user model.
A question often unanswered in probabilistic implementations is that of
where to obtain the values of the prior probabilities. The ``best'' user
model is the most likely one, evaluated as described above, in terms of the
priors. If these priors were only as good as the a priori guesses of
the knowledge engineer, the results would be questionable indeed. Even if
they are based upon an accurate assessment of the distribution of the user
population, peculiarities will remain. For instance, the reader will
have noted that the random variables in this work have been considered as
independent. This means that sometimes important relationships can go
unrepresented: even if and
accurately describes the distribution of members of the department of
computer science, it does not reflect the striking datum that all but one
of the faculty members is male. Nor would these representations be able to
adapt to changes over time in the user population, without intervention by
a knowledge engineer.
It is possible within the representation described, to encode dependencies between random variables [17]. We could render explicit the relationship between sex and category, for instance, and continue to add other dependencies we consider important. The problem is that there may be many such relationships, and it will be difficult to decide which are important enough to encode, and which can be safely ignored. Not only is this a knowledge engineering task which we wish to avoid, but the computational complexity of the resulting Bayesian network rises dramatically with the number of arcs representing dependence.
Instead, we obtain priors from an episodic knowledge base (EKB) that tracks the user population over time. Priors thus obtained will reflect actual relationships in the user population, with a computational complexity much lower than that for a complete Bayesian analysis (?), and the significant costs of error-prone knowledge engineering are completely avoided.
Prior determination can be implemented as a procedural attachment in
Prolog. When the reasoning agent requires a prior value, a call is made
to the EKB. P(male) is originally supplied by the EKB as simply the
number of individuals in the EKB who are male, divided by the total number
of individuals (assume this value is ). Later, after the user has
indicated that he is a faculty member, the EKB is queried for the value of
P(male|faculty). The value is the number of individuals in the EKB who
are male faculty members, divided by the number of individuals who are
faculty members (assume this value is
).
We consider the EKB as a kind of user model server, a repository of information about the user population that can be queried by applications serving that population. Multiple local EKBs could be located within an institution to serve different subsets of the complete user population, and global EKBs could serve queries over the entire population by querying all of the relevant local EKBs.
The EKB is a separate agent that communicates with other agents via TCP/IP. Its design is described elsewhere []. At the beginning of a session, and whenever required after additional observations are made about the user, the EKB is queried for prior values. At the end of a session, the EKB is supplied with a new data point, a new individual; the information supplied is in fact the current, most likely user model, consisting of the highest-probability assumptions. This data point in the EKB can move over time, with continued experience with the same user.
Since the information in the EKBs is intended to be persistent, it is a design issue of some social impact whether the individuals stored in the EKB should be identifiable. If they are, then persistent models of individuals would result in highly accurate priors at the start of a new session with a user that had already used the system, or who had used another system served by the same EKB. If not, then the system must begin to model the user anonymously, from scratch, albeit with the help of a growing EKB. Individual tracking will keep the number of individuals stored in the EKB down to the number of actual users accessing the systems served by the EKB, while anonymous storage will result in a new entry in the EKB for every session by every user at every system served by the EKB. There are clearly significant advantages to individual tracking, but these must be weighed against its (still unclear) social repercussions.
There is a relationship between the notion, well-entrenched in the user modelling literature, of user stereotypes [3][4][20] and the EKB approach to determination of prior probabilities. This relationship is hinted at by Rich: ``...stereotypes can be viewed as concepts, and then they can be learned with statistical concept-learning methods.'' [21] Our approach can be seen in just this light.
The ``best model'' described so far in the rest of this paper is itself a kind of stereotype - the average user stereotype. This information is all prior, and is trivially available from the EKB.
Once something is known about the user, this information can be used as in conventional stereotyping approaches to ``trigger'' the application of a stereotype. Say, for instance, that the user has indicated, or we have reasoned by plan recognition, that he is a faculty member. The conventional approach would have us apply the `faculty-member stereotype' and attribute thereby to the user the defaults therein. The EKB approach will accomplish the same thing.
A difference between the conventional and the EKB approach is that there is no need in the EKB approach to foresee all the possible combinations of triggers; the EKB can provide, on demand, a `stereotype' for any conceivable combination of `triggers.' Thus, we may encounter in the course of interaction with the user population, an individual who is both a faculty member and a smoker, as well as an avid baseball fan. Surely it is unreasonable to expect a knowledge engineer to prepare a stereotype for `faculty-smoker-baseball.' Although multiple inheritance from separate stereotypes for faculty, smoker and baseball can accomodate this particular example, considerable attention must still be given to preference for attribute values when these disagree between stereotypes. The EKB is immune from these problems, and provides values for any attribute on demand.
In this subsection I describe a simpler approach that does not require the mechanism of an EKB, but which does not capture all the dependencies that may exist in a user population.
The knowledge engineer can ``seed'' the representation of prior
probabilities. For instance, he can set the value of the random variable
describing whether a user is a student or faculty member as follows: ; instead of storing priors as single real numbers, he can store pairs
of numerators and denominators whose quotients are the priors. The value
of the denominator can be interpreted as the size of the sample space from
which the prior is determined, and the numerator as the number of positive
instances found in that sample.
Over usage, additional instances are encountered, which can be seen as extending the sample space, and consequently having an effect on the values of the priors. A positive instance increments the numerator of the corresponding alternative, and the denominator of all alternatives.
There are obviously many pairs of numbers which, when divided, result in
the same quotient. For instance, there are many such that
, but the magnitudes of the numbers chosen to represent the
priors influences their sensitivity to new information. The larger the
denominator applied to the representation of a random variable, the more
confidence implied, and the less sensitive it will be to observations.
(See [25].)

The expressivenes of the representation language is affected by the following choices in sometimes subtle ways.
The preceding investigation has been formulated in terms of a system which minimizes positive cost. This approach encourages designs which are ``minimal'' with respect to the number of design assumptions made.
Another possibility is to implement a system which maximizes positive utility. This approach produces designs which are ``maximal'' in the number of design assumptions. This distinction is important because we are using simple summation to calculate the total cost/utility of a presentation.
These tradeoffs are important for a number of reasons; they affect the expressiveness of the language, as well as the computational complexity of the implementation. It is not quite the same thing to say that `presentations that do not depict sports events are preferred over those which do' and to say that `do not show sports events.' The first can be expressed with a system that maximizes positive utility by setting the utility of sports events to be lower than any other utilities. The second requires an ability to express negative bias (`user hates sports') which cannot be represented in a system that uses a monotonic utility function. We give up this expressive power in favor of monotonicity because non-monotonic utility functions would require the system to generate the entire search tree before reporting the best solution. As the space may be very large, this would be an unacceptable strategy in our presentation domain, which currently benefits from a best-first search strategy.