Stat 535C - CPSC 535D Project Suggestions

Most projects rely on published journal papers/technical reports.
I expect from you more than just being able to reproduce the results therein.
Extensions of the models/variations over the models proposed, cleverer algorithms to fit these models
and original applications are very welcome.


Bayesian Partition Models

The idea is to split the predictor space X into an unknown number of disjoint regions. Within each region the data is assumed to be exchangeable and to come from simple distributions.
One can propose to use a Voronoi tessellation of the prediction space based on Euclidean distance. Propose a Bayesian model for the centers of these locations, their number and the
unknown parameters associated to these partitions. Develop an MCMC algorithm to fit this model. Evaluate the performance of this approach on several datasets.


Please read Bayesian partitioning for classification and regression  by D. Denison, C.C. Holmes and B. Mallick, Technical report Imperial College, 2001.

Bayesian Polychotomous Logistic Regression through Auxiliary Variables

Develop a Bayesian model for polychotomous (i.e. multiclass) logistic regression based on auxiliary variables.  Then extend this algorithm to the case where variable selection is
additionally performed so as to reduce the set of explanatory variables. Assess the performance of your model and algorithm on several datasets.

Please read  Bayesian auxiliary variable models for binary and polychotomous regression. by C.C. Holmes and K. Held Bayesian Analysis, 2004.

Bayesian approach to recombination hotspots in DNA sequence

Recombination hotspots can be defined as small regions along a DNA sequence where the recombination rate is increased significantly relative
to the local background rate.

Please read  Recombination hostpots as a point process by De Iorio et al., Philosophical Transactions of the Royal Society: Biological Sciences, 2005.

Minimization of Bayesian Loss Functions using Stochastic Approximation

For standard loss functions (e.g. quadratic, 0/1), the associate estimators admit a well-known expression and can easily be estimated
using the output of an MCMC algorithm sampling the posterior. However if the loss function is not standard, the minimization can be very tricky.
The aim of this project is to develop a stochastic approximation (e.g. Robbins-Monro) algorithm combined to the MCMC ouput
to optimize non-standard loss functions. This algorithm will be demonstrated on several (non-trivial) examples.

A Bayesian Nonparametric Approach to Inference for Spatial Poisson Processes

We want a method to analyze spatial point patterns, which are assumed to arise as a set of observations from a spatial non-homogeneous Poisson process.
The spatial point pattern is observed in a bounded region, which, for most applications, is taken to be a rectangle in the space where the process is defined.
The method is based on modeling a density function, defined on this bounded region, that is directly related with the intensity function of the Poisson process.
We want to develop a flexible nonparametric Bayesian mixture model for this density using a bivariate Beta distribution for the mixture kernel and a Dirichlet process prior
for the mixing distribution. An MCMC algorithm will be developed to fit these data

Please read: A Bayesian Nonparametric Approach to Inference for Spatial Poisson Processes  by A. Kottas and B. Sanso, Technical Report, UCSC, 2005.

Bayesian Model-based Subspace Clustering

We propose a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of
the population, called an attribute ensemble, may depend on the cluster being considered. The model is based on a P'olya urn cluster model, which is
equivalent to a Dirichlet process mixture of multivariate normal distributions.
This model-based approach allows for the incorporation of application-specific data features into the clustering scheme. For example, in an analysis of
genetic CGH array data we account for spatial correlation of genetic abnormalities along the genome

Please read: Model-based subspace clustering by Hoff, P.D. (2006), Bayesian Analysis, vol. 1 no. 2, 321-344.

A Bayesian Approach to Small n, large p problems

When you have too many explanatory variables, you need to do something... This paper proposes a Bayesian version of the SVD.

Please read: Bayesian Factor Regression Models in the ``Large p, Small n'' Paradigm   by M. West, Bayesian Statistics, 2003.

A Bayesian Approch to the Lasso and the Elastic Net

The Lasso (Tibshirani, 1996) and the Elastic Net (Hastie, 2005) are two extremely popular methods to perform regression in high dimensional
linear models.  These methods are NOT Bayesian. The aim of this project is to derive and implement a Bayesian version
of these methods.

Please read: Alternative prior distributions for variable selection with very many more variables than observations by J. Griffin and P. Brown, Technical report 2005

Efficient Bayesian inference for Multiple Changepoint Problems.

The aim of this project is to perform Bayesian inference in situations where multiple changepoints occur in a time series; the number of changepoints being unknown.
The class of models assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints.
Exact calculations can be performed but the computational complexity is quadratic in the number of observations.
In a sequential framework, it is thus necessary to perform some approximations. The aim of this project is to develop several sequential Monte Carlo algorithms
in such contexts.

Please read: Exact and Efficient Bayesian inference for Multiple Changepoint Problems. by P. Fearnhead, Statistics and Computing, 2006. To appear.
                   Online Inference for Multiple Changepoint Problems. by P. Fearnhead & Z. Liu. Technical report, 2006.

Sequential Bayesian Estimation for Blind Equalization in Fast Fading Channels

The aim of this project is to develop efficient sequential Bayesian estimation schemes for blind equalization in flat
and frequency selecting fast fading channels.

Please read: Sequential Monte Carlo Methods for Digital Communications, Elena Punskaya, PhD thesis, Cambridge University Engineering Department, 2003.

Bayesian Inference for Partially Observed Diffusions

The aim of this project is to perform Bayesian inference for partially observed diffusion models and applied them
to some financial time series models.

Please read: Likelihood based inference for diffusion driven models  by S. Chib, M.K. Pitt and N. Shephard, Technical report, 2004.