Applications of Machine Learning in Computer Graphics and Animation

This page is (sporadically!) maintained by Michiel van de Panne.
It provides a list of papers where machine learning algorithms play a key role in solving the problem at hand.

Style Based Inverse Kinematics
SIGGRAPH 2004
Given example motion data, character poses are modeled as a probability distribution over the space of possible poses. The probability distribution is modeled using a gaussian process latent variable model. Given a set of constraints on the character, i.e., left foot here and right hand there, the most likely pose that satisfies those constraints is found using a gradient-based optimization method. This is based on a differentiable likelihood objective function along with an added penalty term to enforce the constraints.
Segmenting Motion Capture Data into Distinct Behaviors
GI 2004
The goal here is to automatically segment a stream of human motion data into distinct high-level behaviors, e.g., walking, running, punching. Three techniques are implemented and compared: (1) Principal Component Analysis, taking the most significant r components, and then segmenting where the remaining projection error increases sharply; (2) Probabilistic PCA: use PPCA to represent a set of poses as a Guassian distribution, then use the Mahalanobis distance to help define segmentation points. (3) Guassian Mixture Model, as computed using using EM-based clustering.
A Data-Driven Reflectance Model
SIGGRAPH 2003
Builds data-based reflectance models that allow for easy interpolation and extrapolation, in a perceptually-meaningful way, to create new reflectance models. Each of 104 example reflectance models has roughly 4,000,000 parameters; these are first reduced to a 45-dimensional vector using PCA. Secondly, non-linear dimensionality reduction ("manifold charting" in this case) is applied to map things onto a lower dimensional 10D manifold. Some basic physics constraints still need to be taken into account when back-projecting from the manifold back to the full space
Machine Learning for Computer Graphics: A Manifesto and Tutorial
Pacific Graphics 2003
An overview of what machine learning has to offer the graphics community, with an emphasis on Bayesian techniques. Common misconceptions about machine learning are addressed and a tutorial on Bayesian reasoning is included.
Motion Synthesis from Annotations
SIGGRAPH 2003
The core of this paper revolves around how to efficiently cut-and-paste together chunks of motion in order to satisfy annotation and position constraints. In this context, annotations are labels that are assigned to parts of motions, such as "wave", "pickup", or "crouch". Machine learning is used to help automate the addition of annotations to the motions. Given motion examples that are annotated by hand, a support vector machine classifier is constructed for each type of motion.
Motion Texture: A Two-Level Statistical Model for Character Motion
SIGGRAPH 2002
Given example human motion data (dancing motion in this case), a two-level statistical model is developed for the motion. The bottom level consists of a collection of "motion textons", and the top level consists of a matrix of transition probabilities between the motion textons. Each motion texton is modeled as a linear dynamical system. There is a chicken-and-egg problem in deriving this two-level model from the data. Given motion data that is already segmented into various motion textons, it would be easy to compute the LDS "motion texton" model. Similarly, given a set of LDS motion texton models, it would be easy to segment the motion into the most likely motion textons. This chicken-and-egg problem is resolved by obtaining a good (greedy) initial guess and then using EM. PCA is also applied to reduce the dimensionality of the motion data.
Composable Controllers for Physics-Based Character Animation
SIGGRAPH 2000
Developing control strategies for basic human or robot motions such as taking a step forward or even maintaining balance while standing is a difficult problem and thus a common strategy has been to develop individual controllers by hand for these tasks. This paper looks at how multiple such controllers can be made to work together. An oracle is developed for each controller which, when given the current state of the body, can determine whether the controller is competent to handle the current state. For example, a controller that maintains a balance standing posture needs to know if to "give up" when a strong push is given. Support vector machines are used to develop such will-succeed/will-fail oracles for a set of controllers.
Style Machines
SIGGRAPH 2000
A "stylistically parameterized" HMM model is extracted from example human motion data, in this case for classically-trained dance. A maximum-likelihood HMM is estimated from the data using EM. Extra terms are added to the learning objective function so that a stylistically parameterized model is produced, rather than a single model fitted to all the data. This requires specification and initialization of a generic model that will then be parameterized. New motions are then synthesized as maximum-likelihood paths through the HMM states, which have two Gaussian distribution models, one for the character joint angles and and one for the joint angle velocities.
Sampling Plausible Solutions to Multi-body Constraint Problems
SIGGRAPH 2000
A Markov-Chain Monte Carlo algorithm is applied to sample from a space of plausible animations that satisfy some particular goal. The "probability" of an animation is modeled as the product of a "plausibility" term and a constraint-satisfaction term. The constraint terms and the Markov-chain proposal distributions both require careful design and experimentation.
A Morphable Model for the Synthesis of 3D Faces
SIGGRAPH 1999
Statistically-based shape and texture models are developed for 3D face models. The models are then used for 3D face reconstruction from single images, or for generating new faces based upon meaningful interpolations and extrapolations. 3D shape and 2D texture data are obtained for 200 face models. New shapes and textures can be expressed as a linear weighted interpolation of the existing faces. The plausibility any given interpolated face is computed by fitting a multivariate normal distribution to the 200 faces; this is approximated using PCA. Care needs to be taken to establish appopriate correct correspondences between the mesh of the generic morphable head model and the mesh of the scanned 3D face examples. Similarly, careful preprocessing must be applied to the textures to remove variations in illumincation, etc.
NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models
SIGGRAPH 1998
This paper demonstrates the possibility of replacing the numerical simulation of physics-based motion (take a swinging 4-link chain for example) with a neural network. Given training data that provides examples of next_state = f (current_state, applied_forces), backpropagation is employed to train a neural network to perform the next_state prediction. Because the neural-network gives a nicely differentiable form for the next-state prediction, it is possible to apply "backpropagation through time" to compute (in an iterative fashion) the applied controls required to achieve a given task. This is demonstrated with examples of parking a car, guiding a lunar lander, and learning a swimming behavior for a virtual dolphin. Preprocessing steps on the data are important in achieving the given results.
Metropolis Light Transport
SIGGRAPH 1997
Markov-Chain Monte Carlo sampling is used to approximate global illumaination integrals. To render an image, a sequence of light transport paths are generated by randomly mutating a single current path. Each such mutation is accepted or rejected to ensure that paths are sampled according to the contribution they make to the ideal image. The image is estimated by sampling many paths and recording their locations on the image plane.
Specifying Gestures by Example
SIGGRAPH 1991
Statistically-based recognition of single stroke stylus gestures is proposed. A vector of 13 scalar features is extracted for every stroke. The feature vector is then classified as one of N possible gestures using a "linear machine" classifier, which determines the Bayes decision boundaries under the assumption of Gaussian distributions. The Mahalanobis distance can be used to reject "bad" gestures, although this is found to be less than ideal in that it also rejects many seemingly good gestures.

Last modified: Thu Jul 15 23:01:36 Pacific Daylight Time 2004