Clustering Contextual Facial Display Sequences

by Jesse Hoey

We describe a method for learning classes of facial motion patterns from video of a human interacting with a computerized embodied agent. The method also learns correlations between the uncovered motion classes and the current interaction context. Our work is motivated by two hypotheses. First, a computer user's facial displays will be context dependent, especially in the presence of an embodied agent. Second, each interactant will use their face in different ways, for different purposes. Our method describes facial motion using optical flow over the entire face, projected to the complete orthogonal basis of Zernike polynomials. A context-dependent mixture of hidden Markov models (cmHMM) clusters the resulting temporal sequences of feature vectors into facial display classes. We apply the clustering technique to sequences of continuous video, in which a single face is tracked and spatially segmented. We discuss the classes of patterns uncovered for a number of subjects.

Back to the LCI Forum page