CPSC 530A Projects

CPSC 530A - Probabilistic Machine Learning - Nando de Freitas
Projects - Fall 2002

Note 1: The following represent preliminary abstracts, further updates will come soon.

Note 2: The abstracts are presented in alphabetical order of author's (or first author's) last name.

Pingdong Ai	Dustin Lang
Mirela Andronescu	Asher Lipson
Matthew Brown	Gabriel Mititica
Michael Chen	Ruben Morales-Menendez & Jim Mutch
Renji Chen	Kenji Okuma
James Cook, Iryna Skrypnyk & Roland Wenzel	Baharak Rastegari
Fengdong Du	Dana Sharon
Shaun Xiushan Feng	Sohrab Shah
Diana Kapsa	Kevin Smyth
Fred Kimberley	Vadym Voznyuk
Hendrik Kueck	Peng Zhao

Pingdong Ai

Abstract. Support vector machines have been used successfully in numerous real-world learning application. Simon Tong and Daphne Koller introduced SVM active learning and applied it to text classification. In my course project, I will implement their algorithm and test with experiments.

Full report (to come)

Do CANARY and CAT like each other?: Protein-DNA recognition using EM

Mirela Andronescu

Abstract. Over the past 25 years, people have tried to find rules for protein-DNA recognition. Research has proved that there is no simple, deterministic recognition code. Previous work has tried to find recognition rules by maximizing the likelihood of the effective energy of binding, using in vitro experiments. As many in vivo protein-DNA complexes became available in the last years, in this work I try a different approach, using an EM algorithm to identify words of DNA that bind to amino-acid words. Having the transition probabilities matrix, one can scan through a given protein and a given DNA sequence and give the probability of their interaction.

Probabilistic Matting

Matthew Brown

Abstract. This project will explore new approaches to alpha matting using probablistic learning and inference. This will enable users to cut and paste objects in images by simply specifying areas of texture, instead of laboriously tracing a contour.
Previous approaches to alpha matting have used learning and inference with probabilistic models of colour. Recently, I extended this idea by learning models of complete textures and using these for inference of alpha mattes (this was joint work with Andrew Blake at Microsoft Research, Cambridge, UK). In this project, I will drop the Markov assumption and attempt learning and inference using complete image functions.

Full report (to come)

EM on Mixture of Poisson

Michael Chen

Abstract. The problem rises in evaluating the rates of debris or meteoroid coming towards the space shuttle and causing damage of tiles. Loss of tiles due to debris and meteoroid makes the biggest threats to the safety of the space shuttle next to critical failure. Debris and meteoroids come to space shuttle as mixture of Poisson process. Correctly classify and evaluate the rates will significantly affect the maintenance policy, reduce the cost and enhance the reliability.

Full report (to come)

Experiments with a new sparseness inducing prior

Renji Chen

Abstract. In this course project, a new sparseness inducing prior which does not involve any hyper-parameters that need to be adjusted will be introduced. Experiments with several publicly available benchmark data sets show that this proposed approach yields state-of-the-art performance. In particular, it outperforms Support Vector Machine and performs competitively with the best alternative techniques both in terms of error rates and sparseness. My course project will be implementation of this approach proposed by Figueiredo to validate its advantages.

Reference:
M. Figueiredo, "Adaptive sparseness using Jeffreys prior". Neural Information Processing Systems - NIPS ' 2001, Vancouver, December 2001

Full report (to come)

Probabilistic Techniques for Simultaneous Localization and Map Building (SLAMB) in Mobile Robotics

James Cook, Iryna Skrypnyk and Roland Wenzel

Abstract. An important feature of an autonomous mobile robotic system is its ability to accurately localize itself while simultaneously constructing a map of its environment. This problem is complicated because of its chicken-and-egg nature: in order to determine its location the robot needs to know the map, and in order to build an accurate map the robot must know where it is. In addition, a robust system must account for the noise in odometry and sensor readings. This project explores the probabilistic methods of solving the SLAMB problem. In particular, particle filtering is iteratively applied to estimate the current location of the robot. The map is then approximated analytically using Rao-Blackwellisation, thus significantly reducing the size of the state space.

Full report (James) Full report (Iryna) Full report (Roland)

Fengdong Du

Abstract. For regression and classification applications, we look for models that not only produce good result, but also avoid unnecessary computation complexity. In this sense, LASSO is better than Ridge, since in Lasso many parameters are set to zero if their corresponding dimensions are irrelevant to our predict.
However, in the Lasso approach, we have to choose some hyper parameter, e.g. parameter b when we do quadratic programming. Choosing the hyper parameters is normally done by using cross-validation technique. This approach does not utilize the available data very well and is time consuming.
Figueiredo proposed a new approach that is hyper-parameter free and automatically computes the best parameters for Laplacian prior. This is done by decomposing the Laplacian prior into a hierachical-Bayes model and then applying EM algorithm to maximize the aposteriori.

Full report (to come)

Safe or not? --- Probabilistic Verification of Real Time Systems

Shaun Xiushan Feng

Abstract. This paper extends the previous work on probabilistic plan verification. In real world, due to the limited resource, it is highly challenging to build a 100% guaranteed safe plans. However, for most of the case, we still need some reasonable plans, which can be proved to be safe with certain threshold. In order to verify the safety of such plans, probabilistic plan verification is introduced. Håkan et al. [1] use Monte Carlo simulation to generate sample execution paths for the plans. The sampling method they used is acceptance sampling, which randomly samples along execution paths without using any prior knowledge. In this paper, I extend the current work by combining important sampling with acceptance sampling. The test results show that my extension has a better performance.

Auxiliary particle filtering applied to stochastic volatility models

Diana Kapsa

Abstract. This paper focuses the estimation of a stochastic volatility model as e.g. resulting from the Black-Scholes model. Following a Bayesian approach we try to estimate the future state of a system by relaying on past data and new incoming observations. The quality of the resulting prediction depends highly on the chosen prior respectively on the prior distribution of the hidden variables, that can be estimated recursively in time. For solving this special problem, also know as the optimal filtering problem, we will implement and use the auxiliary particle filtering method as introduced by Andrieu, Davy and Doucet (2001).

Full report (to come)

Doing the Hokey Pokey: Determining the Position of Bodyparts Based on Hand and Foot Locations

Fred Kimberley

Abstract. Animation is a time consuming process requiring highly trained specialists. Animators could save significant amounts of time if they were able to quickly sketch out an initial rough draft of the motion and then fine tune it. This project uses motion capture data to determine where body parts should be positioned based on the locations of the hands and feet.

Full report (to come)

Application of Online EM to Image Segmentation

Hendrik Kueck

Abstract. The goal of image segmentation is to identify homogeneous regions in images. One possible approach to this problem is to model the attributes of the pixels (color, local texture properties, depth, ...) as being drawn from a mixture of gaussians. Malik et. at. used the EM algorithm to cluster the pixels using such a model. One problem is that the amount of data is very large (one data point for every pixel). In this project I will use the Online EM algorithm that should perform better and allow for larger images to be segmented in reasonable time.

Full report (to come)

Does Bach sound like Bjork?: Searching in music collections

Dustin Lang

Abstract. A vast amount of music is stored in computer systems. It would be useful to be able to search in these music collections. My project seeks to provide a reasonably scalable method for finding songs that "sound similar" (in some sense) to another song or brief sound clip.

Full report (to come)

Learning Human-Like Play in Simulated Agents

Asher Lipson

Abstract. This paper suggests a novel approach to solving the problem of developing game agents that are more human-like in their game play. One wants the agent to play games as a human would, including making mistakes. The approach taken uses a multi-modal mixture model and EM to learn from previous games played by human players. This model is then used to query what action the agent should take based on its current state.

Full report (to come)

On Asymptotic Efficiency for an Importance Sampling Method

Gabriel Mititica

Abstract. Importance Sampling is a Monte Carlo simulation technique that employs a biased simulation distribution to improve computational efficiency. I consider the simulation design problem using large deviations theory. A sequence of simulations is asymptotic efficient if the number of simulations required to obtain a specified estimator precision does not grow exponentially fast. In this setting, I estimate an upper-bound on the estimator variance. The main result explores sufficient and necessary conditions for asymptotic efficiency of the candidate simulation distributions. The area of impact is that of probabilistic graphical models.

Full report (to come)

Online Monitoring of a Mobile Robot Using Particle Filtering

Ruben Morales-Menendez and Jim Mutch

Abstract. We tested several particle filtering algorithms for online monitoring in a mobile robot. The online monitoring problem consists of identifying the discrete state of a complex system (normal, faulty, etc.) using continuous measurements. We modeled the continuous and discrete states as a jump Markov linear Gaussian system and compared three algorithms: standard particle filtering, Rao-Blackwellised particle filtering, and look-ahead RBPF. Look-ahead RBPF is a new particle filtering variant, developed for fault detection and identification in industrial processes, which attempts to improve sample weighting at time t by looking ahead at data from time t+1. We wanted to test its effectiveness in the changing environment of a mobile robot; early results seem promising.

Is it difficult to track a hockey player?

Kenji Okuma

Abstract. On a digitized video sequence of a NHL hockey game, the objective is to track a hockey player by a probabilistic tracking. Our sequential Monte Carlo tracker uses the second-order AR dynamics and adaptive multi-part color model to realize a robust real-time tracking. However, we discover that the problem of tracking even a single hockey player is quite challenging.

Full report (to come)

Classification of Acute Leukemia based on DNA Microarray gene expression using EM algorithm

Baharak Rastegari

Abstract. Gene expression microarray data is now used in classification of samples into different groups, such as different type of tumors. Here, I want to apply EM algorithm to the one gene expression microarray data, and cluster them in to two groups, each of them is specific leukemia. Before that, other techniques were applied to this data, now I want to use EM algorithm and see how this one will result, and compare it with the previous ones and see if this one does better or not.

Full report (to come)

Acrobat Balancing using On-Line EM Reinforcement Learning

Dana Sharon

Abstract. Acrobat balancing is a variant of the inverted-pendulum balancing problem whereby a 2-link (1-joint) robot must learn to stabilize itself by changing the torque at it's joint. The balancing of such systems is a problematic reinforcement learning task because rewards are usually delayed and both actions and states are continuous. In (1), the authors present an on-line EM algorithm used to train both the actions of the acrobat, and the rewards. I will implement the algorithm presented in (1).
If time permits, I will also use on-line EM reinforcement learning to teach the acrobat how to achieve an inverted pendulum state by swinging from rest.

Reference
(1) Yoshimoto J., Ishii S., Sato. M. "Application of reinforcement learning to balancing of acrobat"

Full report (to come)

An EM algorithm for motif finding in biopolymer sequences

Sohrab Shah

Abstract. Finding common motif elements in a set of protein or DNA sequences helps to determine biologically functional elements contained in the set. For example, common elements to upstream regions of co-expressed genes might indicate regions that are important to the regulation of those genes. Computational techniques have been applied to this problem for more than 15 years. This paper describes the implementation of an EM algorithm due to Lawrence and Reilly (1990). In this work, I report on the formulation of the maximum likelihood and evaluate the EM algorithm's performance on artificial and real biological data.

Reference: C. E. Lawrence and A. A. Reilly. An expectation maximization (EM) algorithm for the identication and characterization of common sites in unaligned biopolymer sequences. Proteins: struct., funct., and genetics, 7:41-51, 1990.

Full report (to come)

Bayesian learning applied to Propositional Satisfiability

Kevin Smyth

Abstract. The Propositional Satisfiability problem (SAT) is a well studied problem in computer science of relevance to fields such as Artificial Intelligence and Hardware Design and Verification. This paper presents a novel application of Bayesian learning to the Propositional Satisfiability problem. By observing the trajectories of a Stochastic Local Search algorithm for SAT, the algorithm uses variable selection to determine which features (ie. propositional variables) are relevant, and then learns a Bayesian network which can be used to predict the dependencies between features. By observing only a linear portion of the trajectory, the algorithm is able to predict with fairly high accuracy the values of a significant fraction of the features. These predictions can then be used to guide the next phase of the local search, or to guide the variable ordering of a complete algorithm for SAT.

Full report (to come)

Step a Bit, Jump a Byte: Learning Linear Dynamic Systems from Motion Capture Data

Vadym Voznyuk

Abstract. Motion capture, or simply mocap, remains the method of choice for obtaining realistic human skeleton animation. It is widely used in feature film production and in video games. For certain sports video games storing motion data in raw, uncompressed form is unacceptable. Here machine learning methods could help us to compress mocap data. In this project, we explore how motion curves could be represented by linear dynamic systems (LDS). We use LDS training software for MATLAB written by Ghahramani and Hinton (technical report CRG-TR-96-2, University of Toronto). The project uses video game motion data from the Electronic Arts Canada mocap studio.

Full report (to come)

Peng Zhao

Abstract. The purpose of this paper is to present a methodology for efficient run-time interpolation between multiple forms. This problem is essentially one of scattered data interpolation. We develop a cardinal basis function, which is a combination of radial basis functions and linear polynomials, as the weight of each example on some location in the abstract space. This algorithm is efficient enough to support interactive applications such as games.

Full report (to come)