Final project for CS532c
There are several kinds of possible projects:
- Application of graphical models to some domain.
This could either be your own research problem, or
you could try reproducing results of someone else's paper.
In this case, you may use an existing software package, such
as my
BNT,
Netica,
or other ones listed
here.
- Implement an existing algorithm.
In this case, you are strongly encouraged to implement the code in
such a way as to make it (re)usable by others.
For example, you could write it in Matlab and integrate it with my BNT
toolbox.
- Develop a new algorithm.
In this case, you will need to demonstrate (theoretically and/or
empirically) why your algorithm is
better than other techniques.
Some possible project suggestions are below.
(Some of these may also be suitable for masters theses.)
In addition to the textbooks handed out in class,
you may find the book
"Probabilistic Networks and Expert Systems" by Cowell, Dawid,
Lauritzen and Spiegelhalter (1999) helpful.
I refer to this below as "the Cowell book".
Bayesian estimation
Matlab interface to BUGS
BUGS
(Bayesian Updating using Gibbs Sampling)
is an extremely popular package for doing Gibbs sampling in
hierarchical Bayesian models (i.e., Bayes nets where the hidden nodes
are parameters, not discrete random variables).
BUGS is a standalone executable for unix or windows (the latter is called
WinBUGS).
It is also callable from
the statistical language R
using
bugsR,
and from Matlab
using this code.
The goal of this project
is to develop a way to represent a hierarchical Bayesian model in
matlab (e.g., an extension of BNT).
One should then be able to solve the model using BUGS or using some
other program (such as VIBES).
This project would be suitable for someone who wants to learn more
about BUGS and mainstream Bayesian statistics.
Comparing VIBES and BUGS
VIBES
(Variational Inference for Bayesian networks)
is an alternative to BUGS in that it uses a deterministic mean field
approximation. VIBES is open source Java; there is also a Matlab
interface.
The goal of this project is to compare speed vs accuracy of the mean
field and the Gibbs sampling methods on various problems in Bayesian
estimation. (For discrete random variables, e.g. Ising models, mean
field is usually much faster, and loopy belief propagation is even
better, but for continuous (non Gaussian) random variables, it's not
so clear.)
See also
Matt
Beal's page for variational Bayes stuff.
Re-implement VIBES in Matlab
VIBES is written in Java.
It would be useful, for both teaching and research,
to have a Matlab version.
The goal of this project is to reimplement (some fraction of) VIBES in
Matlab. For example, you could restrict attention to the mean field
version, which does not exploit tractable substructure.
This project would be useful for someone who wants to learn
Variational Bayes really well.
Approximate sequential Bayesian updating
Cowell
et al (1994) studied the problem of approximate sequential Bayesian
parameter estimation for discrete Bayes nets when there is missing
data.
(See also
Cowell
(1996) and Cowell book p204.)
The method is closely related to
moment
matching. The goal of this project is to implement this algorithm
for the general exponential family (including graphical models).
A good starting point might be
Rebel, a toolbox for
approximate sequential Bayesian updating in the Gaussian case (i.e.,
using Kalman filters and extensions).
Expectation propagation (EP)
EP
generalizes approximate sequential Bayesian updating (see above) to the offline (batch)
case, allowing it to "go back" and fix up errors made earlier in the
process.
The goal of this project is to develop a toolbox for EP.
First, develop a method for moment matching filtering for
the exponential family (see project above);
then add EP methods on top.
(This is hard!)
Speeding up inference
Automated code synthesis for inference
Many inference algorithms, such as junction tree and factor
graph message passing, are slow because they require manipulating
(multiplying, dividing and marginalizing)
multi-dimensional arrays (factors) using general purpose procedures;
similar comments apply to sampling algorithms, such as particle filtering.
The goal of this project is to automatically generate optimized matlab
or C code for doing inference in a specific model (with a specific
evidence pattern, i.e., you know which nodes will be observed; assume
that any hidden node can be queried). Thus
you can "compile" most of the complex indexing into simple code for
manipulating static arrays (whose size is known at compile time).
Some related papers you should read
This would make a good project for someone interested in software
engineering, theorem proving, symbolic algebra, etc.
Efficient exact inference in first order probabilistic models
First-order Bayes nets have a lot of context specific independence (CSI),
which standard junction tree-like techniques cannot exploit.
The paper
Compiling Relational Bayesian
Networks for Exact Inference (2004)
shows how to compile such models into "circuits" for evaluating Z, the
partition function.
As we showed in class, derivatives of the log partition function can
be used to compute moments (marginals).
The goal of this project is to reproduce the results in this paper.
This project would be suitable for someone who knows about BDDs and
other compact representations for boolean functions.
Exact inference in SCFG-like graphical models
Stochastic context free grammars (SCFGs), when represented naively as
graphical models, have large tree width, yet we know exact inference
can be done in O(N^3) time, where N is the length of the sentence,
because SCFGs have a lot of context specific independence (CSI).
Case factor
diagrams (related to BDDs) provide a way of representing and
exploiting this CSI.
The goal of this project is to implement this algorithm,
and check it gives the same results as the standard inside-outside
algorithm for SCFGs.
An alternative, theoretical, project would be to compare/contrast this
representation/ algorithm with that of Darwiche et al mentioned above
(for compiling
first order circuits). Is one provably better than another?
Approximate inference in discrete-state models
Comparing approximate inference for Ising models
Ising models are discrete-state 2D grid-structured MRFs with pairwise potentials.
Many models (Bayes nets, Markov nets, factor graphs) can be converted
into this form.
Exact inference is intractable, so people have tried various
approximations, such as mean field, loopy belief propagation (BP),
generalized belief propagation, Gibbs sampling, Rao-Blackwellised
MCMC, Swendsen-Wang, graph
cuts, etc.
The goal of this project is to empirically compare these methods on
some MRF models (using other people's code), and
and to make a uniform matlab interface to all the functions (so they can
be interchanged in a plug-n-play fashion).
To test, you can use an MRF with random parameters, but it would be better to team up with
someone who is trying to learn MRF parameters from real data (see
below).
I have a vectorized Matlab implementation of loopy BP I can give
you.
Talya Meltzer give me her C++ code (with a Matlab wrapper)
for mean field, loopy BP, generalized BP, Gibbs sampling and
Swendsen-Wang, which I've put
here.
Code for RB-MCMC can be obtained from Firas Hamze or Nando de Freitas.
C++ graphcuts code is available (without matlab interface)
here.
Some related papers you should read first:
Implement approximate inference for Ising models in matlab
It would be useful, for both teaching and research,
to have Matlab implementations (preferably as part of BNT)
of one or more of the following
algorithms, all of which have public implementations in other, less
readable/portable languages.
- Generalized Belief Propagation
(I have C++ code from Marshall Tappen and Talya Meltzer).
- Gibbs sampling.
(BNT contains C code from Bhaskara Marthi for Bayes nets.)
- Swendsen-Wang
(I have C++ code from Talya Meltzer).
- Graphcuts (C++ code is available from the URL above).
Vision
Discriminative Fields for
Modeling Spatial Dependencies in Natural Images is about applying
2D conditional random fields (CRFs) for classifying image regions as
containing "man-made building" or not, on the basis of texture.
The goal of this project is to reproduce the results in the NIPS 2003 paper.
Useful links:
-
labeled
training data.
-
C++
graphcuts code
for approximate inference
- My Matlab
CRF code
- Carl Rasmussen's matlab conjugate
gradient minimizer (better than using netlab or matlab
optimization toolbox)
- Intro
to CRFs by Hanna Wallach
- Maxent
page, includes code
- Steerable
pyramid matlab code, possibly useful set of image features
- Matlab
wavelet toolbox, possibly useful set of image features .
- Paper of CRFs for sign
detection, J. Weinman, 2004
-
Markov Random Field Modeling in Computer Vision, S. Z. Li, 1995.
(I have a hardcopy of the 2001 edition.)
- G. Winkler, "Image Analysis, Random Fields, and MCMC Methods",
2nd edition, 2003.
- Markov random fields and images, P. Perez.
CWI Quarterly, 11(4):413-437, 1998. Review article.
2D CRFs for satellite image classification
The goal of this project is to classify pixels in satellite image data
into classes like field vs road vs forest, using
MRFs/CRFs (see above), or some other technique.
Some possibly useful links:
Layered video
The goal of this project is to reproduce the results in the following paper:
Transformed hidden Markov models: Estimating mixture models of images
and inferring spatial transformations in video sequences (CVPR
2000).
Note that
Brendan Frey
has Matlab code for transformation invariant EM on his home page.
See also
Real-time
On-line Learning of Transformed Hidden Markov Models from Video,
Nemanja Petrovic, Nebojsa Jojic, Brendan J. Frey, Thomas S, Huang,
AIstats 2003, which is 10,000 times faster!
Language/ biosequence analysis
1D CRFs for shallow parsing
Factored 1D CRFs have been used
by Sutton
et al (ICML 2004) to do combined part-of-speech tagging and
named-entity extraction in a unified model.
The goal of this project is to reproduce their experiments.
Some useful links:
RB particle filtering for SLAM
Rao-Blackwellised particle filtering (RBPF) is a way of combining sampling
and exact inference, such as Kalman filters.
In my NIPS 2000 paper, I showed how to use it for solving the
Simultaneous Localization And Mapping (SLAM) problem in mobile
robotics.
Roland Wenzel made a great Matlab implementation/demo of this
for his
class project for Nando's class. (I can give you his Matlab code.)
The goal of this project is to extend the code so it uses
the optimal proposal distibution
(lookahead RBPF), as described in
"Diagnosis by a waiter and a Mars explorer", de Freitas et al, 2004.
Junction tree related projects
Thin junction tree filter
Exact filtering is intractable if the treewidth gets too large,
as usually happens after a few time steps.
This is true
even in jointly Gaussian models (which
are usually solved with Kalman filters) e.g.,in the problem of simultaneous
localizaiton and mapping (SLAM) for mobile robots.
Mark Paskin
suggested a way of using a junction tree to represent the belief state,
and then "thinning" the cliques if they get too big, thereby keeping
inference tractable.
He has code for this
here
The goal of this project is to compare TJTF with other methods, such as
Boyen-Koller or loopy belief propagation, on some standard DBNs (i.e.,
not the SLAM problem).
Exact inference in dynamic object oriented Bayes nets
OOBNs provide syntactic sugar for specifying Bayes nets with
repetitive structure.
DOOBNs extend this to time series data.
Most work has focused on representational (syntax) issues,
but
Bangso
et al (2003)
discusses how to exploit the repetitive structure to
perform fast triangulation of the junction tree in dynamic domains.
The goal of this project is to implement this algorithm.
This project would be suitable for someone who wants to learn junction tree
really well.
N-best hypotheses
As we discussed in class, it is very useful to find the top N most
probable assignments of hidden variables.
A method to do this for arbitrary bayes nets was presented by
Nilson and
Lauritzen (1998).
(See Cowell book p101.)
(A simplification to the case of HMMs is
here.)
Both of these algorithms involve traceback, which is hard to implement
in a parallel, distribution fashion.
A simpler alternative approach is discussed by
Yanover
& Weiss.
Matlab code for this is
here.
This algorithm also has the advantage that it can be extended to the
approximate domain.
The goal of this project is to implement these algorithms (preferably
in Matlab) and compare them.
(You could do this for the marginal MAP problem
(max-sum-product), or just the simpler regular MAP/MPE problem
(max-product).)
Forward propagation/ backward sampling
It is easy to draw samples from the joint posterior if it is
represented as a junction tree; it's similar to, but simpler than,
finding the N best hypotheses. (See Cowell book p101.)
The goal of this project is to implement this function in BNT, for
general graphical models, and for dynamic Bayes nets.
Finding good triangulation/ elimination orders
The goal of this project is to implement
the optimal elimination orderign algorithm of
Arnborg
(1987)
which takes O(n^{k+2}) time, where k is the treewidth.
In addition, you should compare it to
the algorithm in the paper
A Practical Relaxation of Constant-Factor Treewidth Approximation Algorithms
by Hopins and Darwiche (2002).
See also
Bodlaender's
repository of treewidth problems.
Decision making
Planning as probabilistic inference
The paper
Planning by probabilistic inference (Attias, AI/Stats 2003)
suggests a way to reduce planning to inferring the probabilities of
action nodes such that a given goal state is observed to be satisfied.
It offers an interesting alternative to standard (intractable) POMDP
solution techniques.
The goal of this project is to reproduce the experiments in this
paper.
Approximate solutions to influence diagrams
The goal of this project is to come up with ways of performing
approximate "inference" in influence diagrams, so as to come up with
approximate policies (by analogy with approximate marginals computed
e.g., by loopy belief propgation).
The recommended paper for influence diagrams is
Limited
information influence diagrams, which provides a way of representing factored,
finite-horizon POMDPs, with all information arcs shown explicitly (so
one can relax the no-forgetting assumption).
Note: BNT contains code for LIMIDs.
POMDPs
Partially observed Markov decision processes
Learning POMDP structure so as to maximize utility
Hoey &
Little (CVPR 04) show how to learn the state space, and
parameters, of a POMDP so as to maximize utility in a visual face
gesture recognition task.
(This is similar to the concept of "utile distinctions" developed in
Andrew
McCallum's PhD thesis.)
The goal of this project is to reproduce Hoey's work in a simpler
(non-visual) domain, such as McCallum's driving task.
Learning
Data Perturbation for Escaping Local Maxima in Learning
Data Perturbation
is a way of sequentially reweighting the training data of any batch learning
algorithm so as to escape local maxima; it is a kind of "dynamic local
search" algorithm.
Matlab and C++ code (and the paper) can be found
here.
The goal of this project is to reproduce all the experiments in the
original AAAI 2002 paper (greedy structure learning,
parametric EM, structural EM, and logistic regression), using the
above code, plus BNT, and anything else you need to write.
Structure learning from Dependency nets
Dependency nets are models of the form prod_i P(X_i|X_{N_i}).
They do not necessarily define a unique distribution, but are easy to
learn, because you can use a decision tree to learn each conditional.
One idea is to convert this into a Bayes net.
The goal of this project is to reproduce the results in this paper
Learning
Bayesian Networks From Dependency Networks: A Preliminary Study,
Geoff Hulten, David Maxwell Chickering, David Heckerman.
AIstats 2003
Other
Input-Output HMMs
An IO-HMM
is a simple extension to an HMM, where the state transitions are
conditioned on an input (rather like a POMDP, but without the
utilities).
The goal of this project is to implement inference and learning in
IOHMMs. Note that 90% of the code already exists in BNT; there are
just a few details left to write (such as adding a CPD class for mixture of softmax).
Better yet would be to implement IOHMMs on top of my HMM package,
which is much faster than BNT.
Kevin Murphy
Last modified: Tue Sep 14 13:31:38 PDT 2004