This talk presents a method for learning decision theoretic models of facial expressions and gestures from video data. We consider that the meaning of a facial display or gesture to an observer is contained in its relationship to actions and outcomes. An agent wishing to capitalize on these relationships must distinguish facial displays and gestures according to their affordances, or how they help the agent to maximize utility. We show how such an agent can learn relationships between observations of a person's face and gestures, the context, and its own actions and utility function. The agent can use the learned model to calculate which displays are important for choosing actions that optimize over the utility of possible outcomes. Further, the models give indications of which learned facial display and gesture models are redundant for achieving value. Redundant states can be pruned, leading to value-directed structure learning of the observation function: the model learns which displays and gestures are useful to distinguish, and needs no prior information about the number of classes of displays and gestures which are present. We show results using this model in a simple gestural robotic control problem and in a simple card game played by two human players.
Back to the LCI Forum page