CPSC 530A - Probabilistic Machine
Learning - Nando de Freitas
Projects - Fall 2002
| Note 1: The following represent preliminary abstracts, further updates will come soon. |
| Note 2: The abstracts are presented in alphabetical order of author's (or first author's) last name. |
|
|
||
| Pingdong Ai | ||
| Abstract. Support vector machines have been used successfully in numerous real-world learning application. Simon Tong and Daphne Koller introduced SVM active learning and applied it to text classification. In my course project, I will implement their algorithm and test with experiments. | ||
|
|
|
||
| Probabilistic Matting | ||
| Matthew Brown | ||
| Abstract. This project will explore new approaches to
alpha matting using probablistic learning and inference. This will enable
users to cut and paste objects in images by simply specifying areas of
texture, instead of laboriously tracing a contour.
Previous approaches to alpha matting have used learning and inference with probabilistic models of colour. Recently, I extended this idea by learning models of complete textures and using these for inference of alpha mattes (this was joint work with Andrew Blake at Microsoft Research, Cambridge, UK). In this project, I will drop the Markov assumption and attempt learning and inference using complete image functions. |
||
|
|
|
||
| EM on Mixture of Poisson | ||
| Michael Chen | ||
| Abstract. The problem rises in evaluating the rates of debris or meteoroid coming towards the space shuttle and causing damage of tiles. Loss of tiles due to debris and meteoroid makes the biggest threats to the safety of the space shuttle next to critical failure. Debris and meteoroids come to space shuttle as mixture of Poisson process. Correctly classify and evaluate the rates will significantly affect the maintenance policy, reduce the cost and enhance the reliability. | ||
|
|
|
||
| Experiments with a new sparseness inducing prior | ||
| Renji Chen | ||
| Abstract. In this course project, a new sparseness inducing
prior which does not involve any hyper-parameters that need to be adjusted
will be introduced. Experiments with several publicly available benchmark
data sets show that this proposed approach yields state-of-the-art performance.
In particular, it outperforms Support Vector Machine and performs competitively
with the best alternative techniques both in terms of error rates and sparseness.
My course project will be implementation of this approach proposed by Figueiredo
to validate its advantages.
Reference:
|
||
|
|
|
||
| Probabilistic Techniques for Simultaneous Localization and Map Building (SLAMB) in Mobile Robotics | ||
| James Cook, Iryna Skrypnyk and Roland Wenzel | ||
| Abstract. An important feature of an autonomous mobile robotic system is its ability to accurately localize itself while simultaneously constructing a map of its environment. This problem is complicated because of its chicken-and-egg nature: in order to determine its location the robot needs to know the map, and in order to build an accurate map the robot must know where it is. In addition, a robust system must account for the noise in odometry and sensor readings. This project explores the probabilistic methods of solving the SLAMB problem. In particular, particle filtering is iteratively applied to estimate the current location of the robot. The map is then approximated analytically using Rao-Blackwellisation, thus significantly reducing the size of the state space. | ||
|
|
|
||
| Fengdong Du | ||
| Abstract. For regression and classification applications,
we look for models that not only produce good result, but also avoid unnecessary
computation complexity. In this sense, LASSO is better than Ridge, since
in Lasso many parameters are set to zero if their corresponding dimensions
are irrelevant to our predict.
However, in the Lasso approach, we have to choose some hyper parameter, e.g. parameter b when we do quadratic programming. Choosing the hyper parameters is normally done by using cross-validation technique. This approach does not utilize the available data very well and is time consuming. Figueiredo proposed a new approach that is hyper-parameter free and automatically computes the best parameters for Laplacian prior. This is done by decomposing the Laplacian prior into a hierachical-Bayes model and then applying EM algorithm to maximize the aposteriori. |
||
|
|
|
||
| Auxiliary particle filtering applied to stochastic volatility models | ||
| Diana Kapsa | ||
| Abstract. This paper focuses the estimation of a stochastic volatility model as e.g. resulting from the Black-Scholes model. Following a Bayesian approach we try to estimate the future state of a system by relaying on past data and new incoming observations. The quality of the resulting prediction depends highly on the chosen prior respectively on the prior distribution of the hidden variables, that can be estimated recursively in time. For solving this special problem, also know as the optimal filtering problem, we will implement and use the auxiliary particle filtering method as introduced by Andrieu, Davy and Doucet (2001). | ||
|
|
|
||
| Doing the Hokey Pokey: Determining the Position of Bodyparts Based on Hand and Foot Locations | ||
| Fred Kimberley | ||
| Abstract. Animation is a time consuming process requiring highly trained specialists. Animators could save significant amounts of time if they were able to quickly sketch out an initial rough draft of the motion and then fine tune it. This project uses motion capture data to determine where body parts should be positioned based on the locations of the hands and feet. | ||
|
|
|
||
| Application of Online EM to Image Segmentation | ||
| Hendrik Kueck | ||
| Abstract. The goal of image segmentation is to identify homogeneous regions in images. One possible approach to this problem is to model the attributes of the pixels (color, local texture properties, depth, ...) as being drawn from a mixture of gaussians. Malik et. at. used the EM algorithm to cluster the pixels using such a model. One problem is that the amount of data is very large (one data point for every pixel). In this project I will use the Online EM algorithm that should perform better and allow for larger images to be segmented in reasonable time. | ||
|
|
|
||
| Does Bach sound like Bjork?: Searching in music collections | ||
| Dustin Lang | ||
| Abstract. A vast amount of music is stored in computer systems. It would be useful to be able to search in these music collections. My project seeks to provide a reasonably scalable method for finding songs that "sound similar" (in some sense) to another song or brief sound clip. | ||
|
|
|
||
| Learning Human-Like Play in Simulated Agents | ||
| Asher Lipson | ||
| Abstract. This paper suggests a novel approach to solving the problem of developing game agents that are more human-like in their game play. One wants the agent to play games as a human would, including making mistakes. The approach taken uses a multi-modal mixture model and EM to learn from previous games played by human players. This model is then used to query what action the agent should take based on its current state. | ||
|
|
|
||
| On Asymptotic Efficiency for an Importance Sampling Method | ||
| Gabriel Mititica | ||
| Abstract. Importance Sampling is a Monte Carlo simulation technique that employs a biased simulation distribution to improve computational efficiency. I consider the simulation design problem using large deviations theory. A sequence of simulations is asymptotic efficient if the number of simulations required to obtain a specified estimator precision does not grow exponentially fast. In this setting, I estimate an upper-bound on the estimator variance. The main result explores sufficient and necessary conditions for asymptotic efficiency of the candidate simulation distributions. The area of impact is that of probabilistic graphical models. | ||
|
|
|
||
| Is it difficult to track a hockey player? | ||
| Kenji Okuma | ||
| Abstract. On a digitized video sequence of a NHL hockey game, the objective is to track a hockey player by a probabilistic tracking. Our sequential Monte Carlo tracker uses the second-order AR dynamics and adaptive multi-part color model to realize a robust real-time tracking. However, we discover that the problem of tracking even a single hockey player is quite challenging. | ||
|
|
|
||
| Classification of Acute Leukemia based on DNA Microarray gene expression using EM algorithm | ||
| Baharak Rastegari | ||
| Abstract. Gene expression microarray data is now used in classification of samples into different groups, such as different type of tumors. Here, I want to apply EM algorithm to the one gene expression microarray data, and cluster them in to two groups, each of them is specific leukemia. Before that, other techniques were applied to this data, now I want to use EM algorithm and see how this one will result, and compare it with the previous ones and see if this one does better or not. | ||
|
|
|
||
| Acrobat Balancing using On-Line EM Reinforcement Learning | ||
| Dana Sharon | ||
| Abstract. Acrobat balancing is a variant of the inverted-pendulum
balancing problem whereby a 2-link (1-joint) robot must learn to stabilize
itself by changing the torque at it's joint. The balancing of such systems
is a problematic reinforcement learning task because rewards are usually
delayed and both actions and states are continuous. In (1), the authors
present an on-line EM algorithm used to train both the actions of the acrobat,
and the rewards. I will implement the algorithm presented in (1).
If time permits, I will also use on-line EM reinforcement learning to teach the acrobat how to achieve an inverted pendulum state by swinging from rest. Reference
|
||
|
|
|
||
| An EM algorithm for motif finding in biopolymer sequences | ||
| Sohrab Shah | ||
| Abstract. Finding common motif elements in a set of protein
or DNA sequences helps to determine biologically functional elements contained
in the set. For example, common elements to upstream regions of co-expressed
genes might indicate regions that are important to the regulation of those
genes. Computational techniques have been applied to this problem for more
than 15 years. This paper describes the implementation of an EM algorithm
due to Lawrence and Reilly (1990). In this work, I report on the formulation
of the maximum likelihood and evaluate the EM algorithm's performance on
artificial and real biological data.
Reference: C. E. Lawrence and A. A. Reilly. An expectation maximization (EM) algorithm for the identication and characterization of common sites in unaligned biopolymer sequences. Proteins: struct., funct., and genetics, 7:41-51, 1990. |
||
|
|
|
||
| Bayesian learning applied to Propositional Satisfiability | ||
| Kevin Smyth | ||
| Abstract. The Propositional Satisfiability problem (SAT) is a well studied problem in computer science of relevance to fields such as Artificial Intelligence and Hardware Design and Verification. This paper presents a novel application of Bayesian learning to the Propositional Satisfiability problem. By observing the trajectories of a Stochastic Local Search algorithm for SAT, the algorithm uses variable selection to determine which features (ie. propositional variables) are relevant, and then learns a Bayesian network which can be used to predict the dependencies between features. By observing only a linear portion of the trajectory, the algorithm is able to predict with fairly high accuracy the values of a significant fraction of the features. These predictions can then be used to guide the next phase of the local search, or to guide the variable ordering of a complete algorithm for SAT. | ||
|
|
|
||
| Step a Bit, Jump a Byte: Learning Linear Dynamic Systems from Motion Capture Data | ||
| Vadym Voznyuk | ||
| Abstract. Motion capture, or simply mocap, remains the method of choice for obtaining realistic human skeleton animation. It is widely used in feature film production and in video games. For certain sports video games storing motion data in raw, uncompressed form is unacceptable. Here machine learning methods could help us to compress mocap data. In this project, we explore how motion curves could be represented by linear dynamic systems (LDS). We use LDS training software for MATLAB written by Ghahramani and Hinton (technical report CRG-TR-96-2, University of Toronto). The project uses video game motion data from the Electronic Arts Canada mocap studio. | ||
|
|
|
||
| Peng Zhao | ||
| Abstract. The purpose of this paper is to present a methodology for efficient run-time interpolation between multiple forms. This problem is essentially one of scattered data interpolation. We develop a cardinal basis function, which is a combination of radial basis functions and linear polynomials, as the weight of each example on some location in the abstract space. This algorithm is efficient enough to support interactive applications such as games. | ||
|