# Final project for CS532c

There are several kinds of possible projects:
• Application of graphical models to some domain. This could either be your own research problem, or you could try reproducing results of someone else's paper. In this case, you may use an existing software package, such as my BNT, Netica, or other ones listed here.

• Implement an existing algorithm. In this case, you are strongly encouraged to implement the code in such a way as to make it (re)usable by others. For example, you could write it in Matlab and integrate it with my BNT toolbox.

• Develop a new algorithm. In this case, you will need to demonstrate (theoretically and/or empirically) why your algorithm is better than other techniques.
Some possible project suggestions are below. (Some of these may also be suitable for masters theses.)

In addition to the textbooks handed out in class, you may find the book "Probabilistic Networks and Expert Systems" by Cowell, Dawid, Lauritzen and Spiegelhalter (1999) helpful. I refer to this below as "the Cowell book".

# Bayesian estimation

## Matlab interface to BUGS

BUGS (Bayesian Updating using Gibbs Sampling) is an extremely popular package for doing Gibbs sampling in hierarchical Bayesian models (i.e., Bayes nets where the hidden nodes are parameters, not discrete random variables). BUGS is a standalone executable for unix or windows (the latter is called WinBUGS). It is also callable from the statistical language R using bugsR, and from Matlab using this code. The goal of this project is to develop a way to represent a hierarchical Bayesian model in matlab (e.g., an extension of BNT). One should then be able to solve the model using BUGS or using some other program (such as VIBES). This project would be suitable for someone who wants to learn more about BUGS and mainstream Bayesian statistics.

## Comparing VIBES and BUGS

VIBES (Variational Inference for Bayesian networks) is an alternative to BUGS in that it uses a deterministic mean field approximation. VIBES is open source Java; there is also a Matlab interface. The goal of this project is to compare speed vs accuracy of the mean field and the Gibbs sampling methods on various problems in Bayesian estimation. (For discrete random variables, e.g. Ising models, mean field is usually much faster, and loopy belief propagation is even better, but for continuous (non Gaussian) random variables, it's not so clear.) See also Matt Beal's page for variational Bayes stuff.

## Re-implement VIBES in Matlab

VIBES is written in Java. It would be useful, for both teaching and research, to have a Matlab version. The goal of this project is to reimplement (some fraction of) VIBES in Matlab. For example, you could restrict attention to the mean field version, which does not exploit tractable substructure. This project would be useful for someone who wants to learn Variational Bayes really well.

## Approximate sequential Bayesian updating

Cowell et al (1994) studied the problem of approximate sequential Bayesian parameter estimation for discrete Bayes nets when there is missing data. (See also Cowell (1996) and Cowell book p204.) The method is closely related to moment matching. The goal of this project is to implement this algorithm for the general exponential family (including graphical models). A good starting point might be Rebel, a toolbox for approximate sequential Bayesian updating in the Gaussian case (i.e., using Kalman filters and extensions).

## Expectation propagation (EP)

EP generalizes approximate sequential Bayesian updating (see above) to the offline (batch) case, allowing it to "go back" and fix up errors made earlier in the process. The goal of this project is to develop a toolbox for EP. First, develop a method for moment matching filtering for the exponential family (see project above); then add EP methods on top. (This is hard!)

# Speeding up inference

## Automated code synthesis for inference

Many inference algorithms, such as junction tree and factor graph message passing, are slow because they require manipulating (multiplying, dividing and marginalizing) multi-dimensional arrays (factors) using general purpose procedures; similar comments apply to sampling algorithms, such as particle filtering. The goal of this project is to automatically generate optimized matlab or C code for doing inference in a specific model (with a specific evidence pattern, i.e., you know which nodes will be observed; assume that any hidden node can be queried). Thus you can "compile" most of the complex indexing into simple code for manipulating static arrays (whose size is known at compile time). Some related papers you should read
This would make a good project for someone interested in software engineering, theorem proving, symbolic algebra, etc.

## Efficient exact inference in first order probabilistic models

First-order Bayes nets have a lot of context specific independence (CSI), which standard junction tree-like techniques cannot exploit. The paper Compiling Relational Bayesian Networks for Exact Inference (2004) shows how to compile such models into "circuits" for evaluating Z, the partition function. As we showed in class, derivatives of the log partition function can be used to compute moments (marginals). The goal of this project is to reproduce the results in this paper. This project would be suitable for someone who knows about BDDs and other compact representations for boolean functions.

## Exact inference in SCFG-like graphical models

Stochastic context free grammars (SCFGs), when represented naively as graphical models, have large tree width, yet we know exact inference can be done in O(N^3) time, where N is the length of the sentence, because SCFGs have a lot of context specific independence (CSI). Case factor diagrams (related to BDDs) provide a way of representing and exploiting this CSI. The goal of this project is to implement this algorithm, and check it gives the same results as the standard inside-outside algorithm for SCFGs. An alternative, theoretical, project would be to compare/contrast this representation/ algorithm with that of Darwiche et al mentioned above (for compiling first order circuits). Is one provably better than another?

# Approximate inference in discrete-state models

## Comparing approximate inference for Ising models

Ising models are discrete-state 2D grid-structured MRFs with pairwise potentials. Many models (Bayes nets, Markov nets, factor graphs) can be converted into this form. Exact inference is intractable, so people have tried various approximations, such as mean field, loopy belief propagation (BP), generalized belief propagation, Gibbs sampling, Rao-Blackwellised MCMC, Swendsen-Wang, graph cuts, etc.

The goal of this project is to empirically compare these methods on some MRF models (using other people's code), and and to make a uniform matlab interface to all the functions (so they can be interchanged in a plug-n-play fashion). To test, you can use an MRF with random parameters, but it would be better to team up with someone who is trying to learn MRF parameters from real data (see below).

I have a vectorized Matlab implementation of loopy BP I can give you. Talya Meltzer give me her C++ code (with a Matlab wrapper) for mean field, loopy BP, generalized BP, Gibbs sampling and Swendsen-Wang, which I've put here. Code for RB-MCMC can be obtained from Firas Hamze or Nando de Freitas. C++ graphcuts code is available (without matlab interface) here.

Some related papers you should read first:

## Implement approximate inference for Ising models in matlab

It would be useful, for both teaching and research, to have Matlab implementations (preferably as part of BNT) of one or more of the following algorithms, all of which have public implementations in other, less readable/portable languages.
• Generalized Belief Propagation (I have C++ code from Marshall Tappen and Talya Meltzer).
• Gibbs sampling. (BNT contains C code from Bhaskara Marthi for Bayes nets.)
• Swendsen-Wang (I have C++ code from Talya Meltzer).
• Graphcuts (C++ code is available from the URL above).

# Vision

## 2D CRFs for visual texture classification

Discriminative Fields for Modeling Spatial Dependencies in Natural Images is about applying 2D conditional random fields (CRFs) for classifying image regions as containing "man-made building" or not, on the basis of texture. The goal of this project is to reproduce the results in the NIPS 2003 paper. Useful links:

## 2D CRFs for satellite image classification

The goal of this project is to classify pixels in satellite image data into classes like field vs road vs forest, using MRFs/CRFs (see above), or some other technique. Some possibly useful links:

## Layered video

The goal of this project is to reproduce the results in the following paper: Transformed hidden Markov models: Estimating mixture models of images and inferring spatial transformations in video sequences (CVPR 2000). Note that Brendan Frey has Matlab code for transformation invariant EM on his home page. See also Real-time On-line Learning of Transformed Hidden Markov Models from Video, Nemanja Petrovic, Nebojsa Jojic, Brendan J. Frey, Thomas S, Huang, AIstats 2003, which is 10,000 times faster!

# Language/ biosequence analysis

## 1D CRFs for shallow parsing

Factored 1D CRFs have been used by Sutton et al (ICML 2004) to do combined part-of-speech tagging and named-entity extraction in a unified model. The goal of this project is to reproduce their experiments. Some useful links:

# RB particle filtering for SLAM

Rao-Blackwellised particle filtering (RBPF) is a way of combining sampling and exact inference, such as Kalman filters. In my NIPS 2000 paper, I showed how to use it for solving the Simultaneous Localization And Mapping (SLAM) problem in mobile robotics. Roland Wenzel made a great Matlab implementation/demo of this for his class project for Nando's class. (I can give you his Matlab code.) The goal of this project is to extend the code so it uses the optimal proposal distibution (lookahead RBPF), as described in "Diagnosis by a waiter and a Mars explorer", de Freitas et al, 2004.

# Junction tree related projects

## Thin junction tree filter

Exact filtering is intractable if the treewidth gets too large, as usually happens after a few time steps. This is true even in jointly Gaussian models (which are usually solved with Kalman filters) e.g.,in the problem of simultaneous localizaiton and mapping (SLAM) for mobile robots. Mark Paskin suggested a way of using a junction tree to represent the belief state, and then "thinning" the cliques if they get too big, thereby keeping inference tractable. He has code for this here The goal of this project is to compare TJTF with other methods, such as Boyen-Koller or loopy belief propagation, on some standard DBNs (i.e., not the SLAM problem).

## Exact inference in dynamic object oriented Bayes nets

OOBNs provide syntactic sugar for specifying Bayes nets with repetitive structure. DOOBNs extend this to time series data. Most work has focused on representational (syntax) issues, but Bangso et al (2003) discusses how to exploit the repetitive structure to perform fast triangulation of the junction tree in dynamic domains. The goal of this project is to implement this algorithm. This project would be suitable for someone who wants to learn junction tree really well.

## N-best hypotheses

As we discussed in class, it is very useful to find the top N most probable assignments of hidden variables. A method to do this for arbitrary bayes nets was presented by Nilson and Lauritzen (1998). (See Cowell book p101.) (A simplification to the case of HMMs is here.) Both of these algorithms involve traceback, which is hard to implement in a parallel, distribution fashion. A simpler alternative approach is discussed by Yanover & Weiss. Matlab code for this is here. This algorithm also has the advantage that it can be extended to the approximate domain. The goal of this project is to implement these algorithms (preferably in Matlab) and compare them. (You could do this for the marginal MAP problem (max-sum-product), or just the simpler regular MAP/MPE problem (max-product).)

## Forward propagation/ backward sampling

It is easy to draw samples from the joint posterior if it is represented as a junction tree; it's similar to, but simpler than, finding the N best hypotheses. (See Cowell book p101.) The goal of this project is to implement this function in BNT, for general graphical models, and for dynamic Bayes nets.

## Finding good triangulation/ elimination orders

The goal of this project is to implement the optimal elimination orderign algorithm of Arnborg (1987) which takes O(n^{k+2}) time, where k is the treewidth. In addition, you should compare it to the algorithm in the paper A Practical Relaxation of Constant-Factor Treewidth Approximation Algorithms by Hopins and Darwiche (2002).

# Decision making

## Planning as probabilistic inference

The paper Planning by probabilistic inference (Attias, AI/Stats 2003) suggests a way to reduce planning to inferring the probabilities of action nodes such that a given goal state is observed to be satisfied. It offers an interesting alternative to standard (intractable) POMDP solution techniques. The goal of this project is to reproduce the experiments in this paper.

## Approximate solutions to influence diagrams

The goal of this project is to come up with ways of performing approximate "inference" in influence diagrams, so as to come up with approximate policies (by analogy with approximate marginals computed e.g., by loopy belief propgation). The recommended paper for influence diagrams is Limited information influence diagrams, which provides a way of representing factored, finite-horizon POMDPs, with all information arcs shown explicitly (so one can relax the no-forgetting assumption). Note: BNT contains code for LIMIDs.

## POMDPs

Partially observed Markov decision processes

## Learning POMDP structure so as to maximize utility

Hoey & Little (CVPR 04) show how to learn the state space, and parameters, of a POMDP so as to maximize utility in a visual face gesture recognition task. (This is similar to the concept of "utile distinctions" developed in Andrew McCallum's PhD thesis.) The goal of this project is to reproduce Hoey's work in a simpler (non-visual) domain, such as McCallum's driving task.

# Learning

## Data Perturbation for Escaping Local Maxima in Learning

Data Perturbation is a way of sequentially reweighting the training data of any batch learning algorithm so as to escape local maxima; it is a kind of "dynamic local search" algorithm. Matlab and C++ code (and the paper) can be found here. The goal of this project is to reproduce all the experiments in the original AAAI 2002 paper (greedy structure learning, parametric EM, structural EM, and logistic regression), using the above code, plus BNT, and anything else you need to write.

## Structure learning from Dependency nets

Dependency nets are models of the form prod_i P(X_i|X_{N_i}). They do not necessarily define a unique distribution, but are easy to learn, because you can use a decision tree to learn each conditional. One idea is to convert this into a Bayes net. The goal of this project is to reproduce the results in this paper Learning Bayesian Networks From Dependency Networks: A Preliminary Study, Geoff Hulten, David Maxwell Chickering, David Heckerman. AIstats 2003

# Other

## Input-Output HMMs

An IO-HMM is a simple extension to an HMM, where the state transitions are conditioned on an input (rather like a POMDP, but without the utilities). The goal of this project is to implement inference and learning in IOHMMs. Note that 90% of the code already exists in BNT; there are just a few details left to write (such as adding a CPD class for mixture of softmax). Better yet would be to implement IOHMMs on top of my HMM package, which is much faster than BNT.
Kevin Murphy