Stat 535C - CPSC 535D Project Suggestions
Most projects rely on published
journal papers/technical reports.
I expect from you more than just being able to reproduce the results
therein.
Extensions of the models/variations over the models proposed, cleverer
algorithms to fit these models
and original applications are very welcome.
Bayesian Partition Models
The idea is to split the predictor
space X into an unknown number of disjoint regions. Within each region
the data is assumed to be exchangeable and to come from simple
distributions.
One can propose to use a Voronoi tessellation of the prediction space
based on Euclidean distance. Propose a Bayesian model for the centers
of these locations, their number and the
unknown parameters associated to these partitions. Develop an MCMC
algorithm to fit this model. Evaluate the performance of this approach
on several datasets.
Please read
Bayesian partitioning for classification and regression by
D. Denison, C.C. Holmes and B. Mallick, Technical report Imperial
College, 2001.
Bayesian Polychotomous Logistic
Regression through Auxiliary Variables
Develop a Bayesian model for polychotomous (i.e. multiclass) logistic
regression based on auxiliary variables. Then extend this
algorithm to the case where variable selection is
additionally performed so as to reduce the set of explanatory
variables. Assess the performance of your model and algorithm on
several datasets.
Please read
Bayesian auxiliary variable models for binary and polychotomous
regression. by C.C. Holmes and K. Held Bayesian Analysis,
2004.
Bayesian approach to recombination hotspots in DNA sequence
Recombination hotspots can be defined as small regions along a DNA
sequence where the recombination rate is increased significantly
relative
to the local background rate.
Please read Recombination hostpots as a point
process by De Iorio et al., Philosophical Transactions of the Royal
Society: Biological Sciences, 2005.
Minimization of Bayesian Loss Functions using Stochastic
Approximation
For standard loss functions (e.g. quadratic, 0/1), the associate
estimators admit a well-known expression and can easily be estimated
using the output of an MCMC algorithm sampling the posterior. However
if the loss function is not standard, the minimization can be very
tricky.
The aim of this project is to develop a stochastic approximation (e.g.
Robbins-Monro) algorithm combined to the MCMC ouput
to optimize non-standard loss functions. This algorithm will be
demonstrated on several (non-trivial) examples.
A Bayesian Nonparametric Approach to Inference for Spatial Poisson
Processes
We want a method to analyze spatial point patterns, which
are assumed to arise as a set of observations from a spatial
non-homogeneous Poisson process.
The spatial point pattern is observed
in a bounded region, which, for most applications, is taken to be a
rectangle in the space where the process is defined.
The method is
based on modeling a density function, defined on this bounded region,
that is directly related with the intensity function of the Poisson
process.
We want to develop a flexible nonparametric Bayesian mixture model for
this
density using a bivariate Beta distribution for the mixture kernel and
a Dirichlet process prior
for the mixing distribution. An MCMC algorithm will be developed to fit
these data
Please read: A
Bayesian Nonparametric Approach to
Inference for Spatial Poisson Processes by A. Kottas
and B. Sanso, Technical Report, UCSC, 2005.
Bayesian Model-based Subspace
Clustering
We propose a model-based approach to identifying clusters of objects
based on subsets of attributes, so that the attributes that distinguish
a cluster from the rest of
the population, called an attribute ensemble, may depend on the cluster
being considered. The model is based on a P'olya urn cluster model,
which is
equivalent to a Dirichlet process mixture of multivariate normal
distributions.
This model-based approach allows for the incorporation of
application-specific data features into the clustering scheme. For
example, in an analysis of
genetic CGH array data we account for spatial correlation of genetic
abnormalities along the genome
Please read: Model-based
subspace clustering by Hoff, P.D. (2006), Bayesian Analysis, vol. 1 no. 2, 321-344.
A Bayesian Approach to Small n, large p problems
When you have too many explanatory variables, you need to do
something... This paper proposes a Bayesian version of the SVD.
Please read: Bayesian
Factor Regression Models in the
``Large p, Small n''
Paradigm by M. West, Bayesian
Statistics, 2003.
A Bayesian Approch to the Lasso
and the Elastic Net
The Lasso (Tibshirani, 1996) and the Elastic Net (Hastie, 2005) are two
extremely popular methods to
perform regression in high dimensional
linear models. These methods are NOT Bayesian. The aim of this
project is to derive and implement a Bayesian version
of these methods.
Please read: Alternative
prior distributions for variable selection with very many more
variables than observations by J. Griffin and P. Brown, Technical
report 2005
Efficient Bayesian inference for Multiple Changepoint Problems.
The aim of this project is to perform Bayesian inference in situations
where multiple changepoints occur in a time series; the number of
changepoints being unknown.
The class of models assumes
independence between the posterior distribution of the parameters
associated
with segments of data between successive changepoints.
Exact calculations can be performed but the computational complexity is
quadratic in the number of observations.
In a sequential framework, it is thus necessary to perform some
approximations. The aim of this project is to develop several
sequential Monte Carlo algorithms
in such contexts.
Please read: Exact and
Efficient Bayesian inference for Multiple Changepoint Problems. by
P. Fearnhead, Statistics and
Computing, 2006. To appear.
Online
Inference for Multiple Changepoint Problems. by P. Fearnhead &
Z. Liu. Technical report, 2006.
Sequential Bayesian Estimation for
Blind Equalization in Fast Fading Channels
The aim of this project is to develop efficient sequential Bayesian
estimation schemes for blind equalization in flat
and frequency selecting fast fading channels.
Please read: Sequential
Monte Carlo Methods for Digital Communications, Elena Punskaya, PhD
thesis, Cambridge University Engineering Department, 2003.
Bayesian Inference for Partially
Observed Diffusions
The aim of this project is to perform Bayesian inference for partially
observed diffusion models and applied them
to some financial time series models.
Please read: Likelihood
based inference for diffusion driven models by S. Chib, M.K.
Pitt and N. Shephard, Technical report, 2004.