One criticism of systems which attempt to model their users is their inscrutability to these users. If the system acts in an infelicitous manner which does not meet the needs of the user, and where this action results from an error in the user model, a desirable property of the system is scrutability. The user should be able to reason to himself as in, for example: ``The system is giving me information I don't need. Hmmm...Maybe it thinks I'm a student, and not a faculty member.'' The user should be able to ``debug'' the user model in this way. Supporting this kind of interaction imposes interesting constraints on the reasoning techniques employed by the system.
Users should also be given the means by which to express their dissatisfaction with a presentation. This expression can be interpreted as either a request for the next best presentation, or as a request that the system select the next most plausible model of the user.
Our interaction paradigm supports both the scrutability and the dissention conditions described above.
In calculating a model that represents the user, the system makes observations upon which further reasoning is conditioned. These observations can be of a number of different types from different sources, but the most important for our purposes here are the observations of the user's interaction at the user interface (described later). Keystrokes and mouse actions can be interpreted using plan-recognition techniques to build elements of the user model, although we have not yet implemented this. Instead, we provide the user with direct access (with a graphical user interface) to salient elements of the user model, as well as with the ability to criticize these elements. It is this interaction which is used to build the user model.
A prior model is first shown to the user. This is the user model constructed initially by the system using only the prior probabilities associated with recognition assumptions, and before the user has seen any presentations designed from the model. The user provides feedback to the system about the validity of some or all of the prior assumptions, and the system uses this new information (observations) to calculate a new prior model of the user, which is then re-displayed. This cycle of prior scrutiny may repeat, until the user finally instructs the system to design a presentation.
After having seen a presentation, and perhaps having been dissatisfied with it, the user can continue to scrutinize the model, now known as a posterior model, in a cycle of activity we refer to as posterior scrutiny. Changes made by the user to the model should result in changes -hopefully in improvements- to the presentation.
Recognition assumptions are made in the course of these abductive processes, and the system always works on the most likely model until it is proven or until another model in the queue becomes more likely .
A difference between prior and posterior scrutiny with model update driven by the user is that prior update from to may not change the best design, or may result in a design which maps to the same presentation; in posterior update, however, the transition from to must be accompanied by a different presentation, if the user is to continue to feel that the system is paying attention to the interaction.
For instance, if the system displays to the user the assumption that the user is a faculty member, and the user does not critique the assumption, the system may upgrade the status of the assumption if the system believes that the assumption has been seen and understood by the user. This is reasonable if: the window in which the assumption is displayed is not obscured, the text in the window is clearly rendered and is large enough, the user is not distracted by other events on the desktop, the user is not distracted by other events in the environment (babies crying, cars colliding, etc.), and so on.
The case where a user critiques assumption but does not critique assumption is of particular interest. Assumption can be upgraded because of the user's direct action, but confidence in assumption can be upgraded based on the user's inaction, on the grounds that the user was actually attending to the appropriate window (because the user did critique assumption ).
Various display strategies can be used to draw the user's attention to one or more of the displayed assumptions. A particular assumption can be highlighted (with color, font, or animation, for instance) to increase the likelihood that the user is attending to the relevant portion of the desktop, and to the relevant part of the window. Such perceptually salient display techniques can increase the reliability of the system's assumptions about the user, but require that the system model such things as the media capabilities of the user's display and interaction devices. The system would also benefit from knowing whether the window in question is partially or totally obscured by other desktop objects, which would require some degree of integration between the system and the operating system or window manager. We have not conducted a deep analysis of such desiderata for future operating systems, but bring them up here because they are compatible with our interaction paradigm, and can be represented in our reasoning framework. Nor have we made empirical investigations to decide which are the most effective display techniques, but we make what we think are reasonable decisions until these results become available.
Simple Bayesian analysis can be performed to calculate posterior probabilities for the recognition assumables that are used. The idea that we seek to capture here is that the user's actions at the interface are conditioned on the state of the world, and in particular upon the user's model of himself and the contents of the user model display window. For simplicity, a separate Bayesian network can be derived from the schema shown in Figure -a and used to represent these relationships for each random variable of interest.
Some of the preceding considerations are represented in the Bayesian network illustrated in Figure -b, where User-Type, User-Action and Display are random variables with a conditional probability distribution over the following states:
Equation 1 describes the expression corresponding to the network in Figure -a. This expression can be evaluated to determine the posterior (after an action by the user) probability of any random variable (assumable) by replacing and with the variable and the value of interest, respectively (as per Figure refFigfig:Bayes-b). In addition to the prior distribution of , the conditional probabilities are needed for .
The number of entries in this contingency table is , and they can be determined empirically; we use what we think are reasonable values. Some of the more important ones are:
For instance, the posterior probability that the user is of type is given by Equation 2, where is one of .
Another way to look at the issue of perceptual salience is to note that the system is engaged in a dialog with the user, and that the presentation of the user model window is a communicative act by the system. The perlocutionary effect  of these acts is knowledge on the part of the user of the user model. The intuition is that the likelihood of achieving this communicative goal is enhanced by using appropriate presentation techniques to highlight the most important parts of the message, and that the use of such techniques licenses an increased commitment by the system to the beliefs which follow from successful communication.
A problem with displaying the model to the user in support of scrutability is that the model may be huge, if not infinite. Certainly, there may be too many recognition assumptions to be effectively (cf. perceptual salience) displayed on a screen, and far too many for quick assimilation by users. This cognitive task is one characterized by focused, serially directed visual attention. Research in psychology has shown that the time taken for such tasks is proportional to the number of objects in the visual field .
A solution is to display a salient subset of , to which we refer as the salient model, consisting of those assumptions which have the greatest effect on the design (and therefore upon the presentation as well). Sensitivity analysis is performed to select the critical recognition assumptions. The cardinality of can be varied to maintain an appropriate pace of interaction; if the user is taking too long to evaluate the alternatives, the number of alternatives in the next iteration can be reduced, and vice versa. In addition, if the alternatives are ranked, [graphical] display techniques can be used to render this ordering to the viewer.
=cmssq8 =cmr7 =cmr10 =cmssq8 at 5.335pt =cmss12 =cmss9 =cmss8
In the following, let represent the currently most plausible model, and the best design that is consistent with . We require a total ordering of the assumptions by which to rank these assumptions for display to the user. More precisely, we seek a function ; can be sorted to order the . The examples below refer to the following disjoint sets of recognition assumptions:
Some useful notation follows.
We want to interpret as the sensitivity of to the assumption . There are many ways to formulate and eventually to program this ranking function.
One approach is to rank the assumptions of the current model according to the expected change in the cost of the best design if is replaced by its peer models, as in the opaque reading of Equation 3, wherein and .
It might be argued that, although this is a reasonable thing for the reasoning engine to calculate, the cost of the best design for one model is incommensurable with the cost of the best design for another model (remember to apply the opaque reading here).
The same approach can be taken transparently, albeit with a loss in directness insofar as the reasoning process is concerned. Now we're talking about evaluating Equation 3 in its transparent reading, wherein and .
In the opaque evaluation, for each of the . In the transparent evaluation, , a constant, i.e., the best design for the most likely model . The problem with the transparent approach to evaluation is that we do not, in our minimal AI reasoning framework, represent all the utilities required to evaluate this expression. This theme emerges at various points throughout this article: we don't want to list all these numbers, and it is not a failing of the approach that the quantities in question are not available. Nevertheless, in general, may be undefined, which is to say that it may be that .
We may pursue a reasonable surrogate for the missing . One way is to calculate for each according to the opaque interpretation of Equation 3, and then evaluate for each peer a metric based on how different the [expected cost of the] best design is from the current best design . Such a metric can be as simple as the count of matching design assumptions, or may involve something like Equation 4.
Equation 4 reads: the cost of given is the cost of the irrelevant part of the presentation (represented by the relevance assumptions in which are not in ) and the opportunity cost of not having made the right presentation (represented by the cost of the relevance assumptions that are in but not in ) less the cost of the intersection of with opaque (the best design for model ). In other words, the two positive terms reflect penalties for making the wrong presentation; the first reflects a penalty for the irrelevant components of the presentation, the second reflects a penalty for the omission of relevant components. Replacing in Equation 3 with the expression in Equation 4 results in what we consider plausible values for all the required quantities (further justification for this approach appears elsewhere in this article).
Still other approaches might involve comparisons of the derivation trees of designs for alternate models.
We are still exploring different ranking functions. Once calculated, the rest of the procedure is the same.
Sort for all and let the salient model , referred to above, consist of the first assumptions in this sorted list. These are the assumptions to which the design, given the current user model, is the most sensitive.
In our interaction paradigm, we show the user the assumptions in , in the context of the disjoint sets to which they belong; i.e., where , we display along with all the names of the assumables in the disjoint set to which belongs. For example, if and some , the user sees a set of names faculty, staff, student with the actual assumption highlighted.
We set , the number of hypotheses displayed, to some small integer like 5 or 7. In sample implementations so far, we use simple text attached to radio buttons, so that users see not only the assumption that was made, but the other members of the disjoint sets to which the assumption belongs; a simple click on a radio button instructs the system to recalculate the user model with the new assumption. (See Section 6 for more on implementation approaches).
Joint sensitivity analysis is not performed because: 1) single faults are dramatically more likely than multiple faults, 2) accounting for multiple faults is exponential, and 3) because the effects of multiple faults would be very difficult for users to track, resulting in a huge cognitive burden that compromises our scrutability requirement.
The system also makes design assumptions, always choosing the lowest cost alternative that is consistent with the model and with the design so far, which results in a design as follows, where is the most likely model:
Our approach is to incrementally work in best-first order on the lowest cost design. Other approaches are, of course, possible, and one that has been suggested is to design based upon a notion of expected value, or expected cost .
Decision-theory has been applied by others to design tasks under uncertainty. Some of this literature (see Cheeseman  for a discussion) argues that the best design is the one that results from averaging over all possible [peer] models (probabilistically weighted), i.e., that the expected cost function is to be minimized to find the best presentation design:
where is the cost of design in the context of model . There are reasons in our application why this may be the wrong thing to do.
First of all, one of the goals of this research is to avoid the formidable knowledge engineering task of developing and justifying a complete contingency table of joint probabilities, and a complete table of utilities. Both of these are required to evaluate Equation 5 in its full generality.
In addition, using expected cost to determine the design would place a remarkable burden upon the user, who would in effect be asked to model all possible users of the system; in order to be able to ``understand'' infelicitous presentations and correct the system's misconceptions, the user would have to know about the utilities/costs of elements of the presentation for other, hypothetical users. The user, for instance, upon presentation of material about a basketball game which he finds completely irrelevant, would have to judge the value of a presentation of sports and leisure material to all of faculty, student and staff. In particular, the following can occur. Let be the disjoint set and assume the following costs: . The user is a faculty member, the system has assumed that the user is a faculty member, and this assumption is clearly indicated to the user. Then, , and . There is no indication to the user why he is watching an irrelevant basketball game, because this behavior is due to the high aversion of students to research-a value not shown to the user. The behavior of the system is inscrutable.
The two approaches often select the same presentation, since is weighted towards the most likely model (the single largest probability term in both approaches is the one that applies to the most likely model), but this is not the issue. What is of concern to us here is the behavior of the system and its impact on user interaction when the presentation is inappropriate because of an error in the user model.
A question often unanswered in probabilistic implementations is that of where to obtain the values of the prior probabilities. The ``best'' user model is the most likely one, evaluated as described above, in terms of the priors. If these priors were only as good as the a priori guesses of the knowledge engineer, the results would be questionable indeed. Even if they are based upon an accurate assessment of the distribution of the user population, peculiarities will remain. For instance, the reader will have noted that the random variables in this work have been considered as independent. This means that sometimes important relationships can go unrepresented: even if and accurately describes the distribution of members of the department of computer science, it does not reflect the striking datum that all but one of the faculty members is male. Nor would these representations be able to adapt to changes over time in the user population, without intervention by a knowledge engineer.
It is possible within the representation described, to encode dependencies between random variables . We could render explicit the relationship between sex and category, for instance, and continue to add other dependencies we consider important. The problem is that there may be many such relationships, and it will be difficult to decide which are important enough to encode, and which can be safely ignored. Not only is this a knowledge engineering task which we wish to avoid, but the computational complexity of the resulting Bayesian network rises dramatically with the number of arcs representing dependence.
Instead, we obtain priors from an episodic knowledge base (EKB) that tracks the user population over time. Priors thus obtained will reflect actual relationships in the user population, with a computational complexity much lower than that for a complete Bayesian analysis (?), and the significant costs of error-prone knowledge engineering are completely avoided.
Prior determination can be implemented as a procedural attachment in Prolog. When the reasoning agent requires a prior value, a call is made to the EKB. P(male) is originally supplied by the EKB as simply the number of individuals in the EKB who are male, divided by the total number of individuals (assume this value is ). Later, after the user has indicated that he is a faculty member, the EKB is queried for the value of P(male|faculty). The value is the number of individuals in the EKB who are male faculty members, divided by the number of individuals who are faculty members (assume this value is ).
We consider the EKB as a kind of user model server, a repository of information about the user population that can be queried by applications serving that population. Multiple local EKBs could be located within an institution to serve different subsets of the complete user population, and global EKBs could serve queries over the entire population by querying all of the relevant local EKBs.
The EKB is a separate agent that communicates with other agents via TCP/IP. Its design is described elsewhere . At the beginning of a session, and whenever required after additional observations are made about the user, the EKB is queried for prior values. At the end of a session, the EKB is supplied with a new data point, a new individual; the information supplied is in fact the current, most likely user model, consisting of the highest-probability assumptions. This data point in the EKB can move over time, with continued experience with the same user.
Since the information in the EKBs is intended to be persistent, it is a design issue of some social impact whether the individuals stored in the EKB should be identifiable. If they are, then persistent models of individuals would result in highly accurate priors at the start of a new session with a user that had already used the system, or who had used another system served by the same EKB. If not, then the system must begin to model the user anonymously, from scratch, albeit with the help of a growing EKB. Individual tracking will keep the number of individuals stored in the EKB down to the number of actual users accessing the systems served by the EKB, while anonymous storage will result in a new entry in the EKB for every session by every user at every system served by the EKB. There are clearly significant advantages to individual tracking, but these must be weighed against its (still unclear) social repercussions.
There is a relationship between the notion, well-entrenched in the user modelling literature, of user stereotypes  and the EKB approach to determination of prior probabilities. This relationship is hinted at by Rich: ``...stereotypes can be viewed as concepts, and then they can be learned with statistical concept-learning methods.''  Our approach can be seen in just this light.
The ``best model'' described so far in the rest of this paper is itself a kind of stereotype - the average user stereotype. This information is all prior, and is trivially available from the EKB.
Once something is known about the user, this information can be used as in conventional stereotyping approaches to ``trigger'' the application of a stereotype. Say, for instance, that the user has indicated, or we have reasoned by plan recognition, that he is a faculty member. The conventional approach would have us apply the `faculty-member stereotype' and attribute thereby to the user the defaults therein. The EKB approach will accomplish the same thing.
A difference between the conventional and the EKB approach is that there is no need in the EKB approach to foresee all the possible combinations of triggers; the EKB can provide, on demand, a `stereotype' for any conceivable combination of `triggers.' Thus, we may encounter in the course of interaction with the user population, an individual who is both a faculty member and a smoker, as well as an avid baseball fan. Surely it is unreasonable to expect a knowledge engineer to prepare a stereotype for `faculty-smoker-baseball.' Although multiple inheritance from separate stereotypes for faculty, smoker and baseball can accomodate this particular example, considerable attention must still be given to preference for attribute values when these disagree between stereotypes. The EKB is immune from these problems, and provides values for any attribute on demand.
In this subsection I describe a simpler approach that does not require the mechanism of an EKB, but which does not capture all the dependencies that may exist in a user population.
The knowledge engineer can ``seed'' the representation of prior probabilities. For instance, he can set the value of the random variable describing whether a user is a student or faculty member as follows: ; instead of storing priors as single real numbers, he can store pairs of numerators and denominators whose quotients are the priors. The value of the denominator can be interpreted as the size of the sample space from which the prior is determined, and the numerator as the number of positive instances found in that sample.
Over usage, additional instances are encountered, which can be seen as extending the sample space, and consequently having an effect on the values of the priors. A positive instance increments the numerator of the corresponding alternative, and the denominator of all alternatives.
There are obviously many pairs of numbers which, when divided, result in the same quotient. For instance, there are many such that , but the magnitudes of the numbers chosen to represent the priors influences their sensitivity to new information. The larger the denominator applied to the representation of a random variable, the more confidence implied, and the less sensitive it will be to observations. (See .)
The expressivenes of the representation language is affected by the following choices in sometimes subtle ways.
The preceding investigation has been formulated in terms of a system which minimizes positive cost. This approach encourages designs which are ``minimal'' with respect to the number of design assumptions made.
Another possibility is to implement a system which maximizes positive utility. This approach produces designs which are ``maximal'' in the number of design assumptions. This distinction is important because we are using simple summation to calculate the total cost/utility of a presentation.
These tradeoffs are important for a number of reasons; they affect the expressiveness of the language, as well as the computational complexity of the implementation. It is not quite the same thing to say that `presentations that do not depict sports events are preferred over those which do' and to say that `do not show sports events.' The first can be expressed with a system that maximizes positive utility by setting the utility of sports events to be lower than any other utilities. The second requires an ability to express negative bias (`user hates sports') which cannot be represented in a system that uses a monotonic utility function. We give up this expressive power in favor of monotonicity because non-monotonic utility functions would require the system to generate the entire search tree before reporting the best solution. As the space may be very large, this would be an unacceptable strategy in our presentation domain, which currently benefits from a best-first search strategy.