CS 540 (Machine learning) projects, Fall 2008

Criteria

Projects should satisfy one (or more!) of the following criteria:

Be a novel result of interest to the machine learning community (ideally publishable in a conference or workshop)
Be a novel result of interest to some other community such as biology or vision; in this case, machine learning must play a critical role in establishing the result.
Be a demo (e.g., figures for my book) of use to other people who teach machine learning. As an example, reproducing the figures of "On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes", would be a fine project (except Matt is already doing it!). More examples are given below.
Be a software package (e.g., an extension to PMTK) of use to other people within machine learning. As an example, in a previous year, Mayram wrote matbugs, a matlab interface to BUGS, which has proved quite popular to the wider community. (She basically just ported some R code to Matlab, but also made a demo, with Alfred Pang, to illustrate her understanding of Gibbs sampling.)

Some suggested projects are given below.

Timeline

Please submit a 1 page proposal (pdf) by Tue 4 Nov. Either bring hard copy to class or send me an email. This should specify what your deliverable will be, why it is interesting/ useful/ challenging, and cite 2 or 3 relevant papers, and any other resources (data sets, software pacakages) you need. Then give a brief timeline of how you will meet your goal in the alloted time. If I see a problem with your suggestion, I will let you know by the end of that week.

You are required to make a 5 minute presentation on your project. The tentative plan is to meet 10-12.30 on Thur Dec 4th in CS 146 (LCI lab meeting room). If you cannot make that time (eg. if you have an exam), please email me asap. (29 students, 5 minutes each, plus 5 minute break makes 2.5 hours.)

On or before Mon 15th December, please submit by email a zip file (Lastname.Firstname.zip) containing your report (pdf file, 8 pages in nips format, optional appendices allowed), and any accompanying code and data. For your code, please make sure it is documented (ideally with a web page, e.g., produced using the "publish" command), and that it runs on a windows machine, so I can easily reproduce all your figures.

You will be evaluated on the quality of your work: novelty and importance of ideas, clarity of written report and oral presentation, and speed, readability and ease of use of your code. (The latter is especially important for projects that are designed to be a tutorial/demo for use with PMTK or the book.) Your report/talk should be written in a way that can be understood by other students in the class. The goal is to be precise but concise. Please do not explain basic material that we have covered, instead focus on your novel contribution. If you are applying ML to some problem domain such as biology, please do not spend too long explaining the background to the problem, but again focus on what you did and why it is interesting from an ML point of view.

You can work in groups of up to 2 people. Group projects need to be more ambitious than singleton projects. You will each get the same grade for your project, so you must decide how to split the work between you (e.g., who will present, who will write the report, who will code).

You have 6 weeks to do the project. This is in lieu of homeworks, so you are expected to spend about 50-60 hours on it.

Chosen projects

Last	First	Project
Aghaeepour	Nima	17A: compare Broet06 to Shah06. Work with Xi Chen.
Cecchetto	Benjamin	21: EM for MixBernoulli
Chao	WIlliam	18: probabilistic matrix factorization
Chen	Xi	16: re-implement Broet06 in Matlab; work with Aghaeepour.
Chen	Bo	14: Apply Sminchisescu's code to some vision data for estimating 3d pose. Joint with Bahman Khanloo.
Cortes	Adrian	Compare mixtures of Gaussians and Students, using EM and VBEM, on flow cytometry data. Work with Michelle Xia (who is doing project 11 for a different course).
Crisan	Anamaria	17B: Reimpement Pique-Regi08 and then compare to Shah06. Joint with: Jing Xiang.
Duvenaud	David	Compare various conditional density estimators on discrete data, and see if they can be used to predict consequences of novel actions (inputs).
Gamroth	Catherine	12: VBEM for hierarchical mixture of epxerts (Bishop03).
Goya	Rodrigo	Sparse logistic regression with Lp penalty with p<1
Goyal	Amit	Determining influence in social networks
Gupta	Ankur	Learning depth from single monocular images
Hall	Joseph	Predict 2d time series of human arousal and valence given EKG, EMG etc data using various off-the-shelf methods.
Khabaz	Mohammad	8,9: Minka's hybrid generative and discriminative method, applied to object detection.
Khanloo	Bahman	See Chen, Bo.
Kini	Pranab	Predict which file will be accessed next given a transaction log and knowledge of the directory structure.
Lacerda	Gustavo	15: predict next 48 hours of energy use given history of use (ignoring covariates).
Likhtarov	Anton	3: FBMP with Q>2 mixture of Gaussians prior.
Lim	Raymond	23: Compare mixture density networks (in netlab) to mixture of experts (using Martin's code).
Mahendran	Nimilan	10: Bayesian interpretation of semisupervised learning
Malekesmaeli	Mani	1,2: Implement Holmes's probabilistic kNN method and modify to use L1 instead of forwards selection. Joint with Valizadeh.
Nguyen	Thanh	Predicting people's restaurant ratings (along multiple aspects) from review text (features already available).
Ning	Kaida	Clustering data where features have an unknown sign.
Saeedi	Noushin	See Khabbaz.
Valizadeh	Amir	See Malekesmaeli.
Xiang	Jing	See Crisan.

Some suggested projects

Projects have been classified into the following types:

methods: novel ML methodology, will require mathematical derivation and implementation, could result in a paper.
pedagogy: reproducing existing results, possibly on synthetic data, so you can learn something, and make useful demos/ figures for the book/ PMTK, to help other people learn in the future.
app: application to some kind of data. Simply downloading software off the web and applying it is not sufficient for a project. However, you can do do some kind of comparative experiment of many methods. Alternatively, you can team up with someone doing a pedagogy project, where you apply the method to real data (which often involves substantial work).

I have also listed methods that I think each project may need. Some acronyms you may not (yet!) know: EM = expectation maximization, VB EM = variational Bayes EM, MCMC = Markov chain Monte Carlo, Opt = optimization, LinAlg = linear algebra, etc.

In the table below, I have rated each project on a scale of 1-3, where 1 = easy and 3 = hard. This is very subjective; I call a project hard if you have to derive new math or algorithms by yourself; I call it medium if you just have to implement someone else's (possibly tricky) methods; and I call it easy if you just have to implement something simple. Harder projects can be split between 2 people. (Note that it is better to do an easy project well than a hard project poorly; but you will be rewarded for tackling something hard.)

# Type Diff Methods Description Papers Software Student

1 Pedagogy 2 Logreg Opt Holmes03 gives a probabilistic model for k-nearest neighbors, and optimizes the weights by maximizing pseudo likelihood using IRLS. He shows this works better than using CV. Goal: reproduce his results in sec 3.2-3.3. Holmes03 . Malekesmaeli and Valizadeh

2 Methods 2 L1 Opt Holmes03 uses forwards selection to choose k in his probabilistic kNN. Goal: use L1 instead. (Team up with project 1.) Holmes03 . Malekesmaeli and Valizadeh

3 Methods 3 EM, Bayes Schniter08 gives a fast approximate Bayesian ifnerence algorithm (based on tree search) for finding the most probable subsets in a sparse linear regression model. He puts a mixture of Gaussians prior on the weights: mean=std=0 (delta function at 0) if the weight is off, mean=0, std>0 if weight is on (feature is selected). He presents an EM algorithm to learn the hyper-params of the prior. This can be extended to a mixture of K Gaussians, eg. mean=-1 or +1 if weights is on. Goal: extend the empirical Bayes EM algorithm to handle K>2. See if this improves empirical results on his standard test example. Schniter08 matlab Anton Likhtarov

4 Methods 3 BIC, linalg Goal: extend Schniter08 (see project 3) to logistic regression (cf. Hans07). The trick is to devise a fast way to update the BIC objective function. Schniter08, Hans07 matlab .

5 Methods 3 Bayes, linalg Goal: extend Schniter08 (see project 3) so that the prior on weights that are on is a T distribution instead of a mixture of Gaussians. (If the component is off, it's still a delta function at 0.) The trick is to devise a fast way to update the sufficient statistics for a T distribution. Schniter08 matlab .

6 Pedagogy 1 Bayes, BIC, CV In PMLmidterm.pdf Fig 6.3, I plot the loglikelihood vs lambda for ridge regression, as well as the AIC, BIC, CV scores, and use this to select lambda. Goal: Make analogous plots for logistic regression. Also plot the marginal likelihood (midtermbook p131) vs lambda for linear regression. Try comparing Bayesian model selection vs BIC and CV across a range of linear regression data sets. PML PMTK .

7 Pedagogy 1 Bayes, linreg, logreg In Minka00, he shows that it "pays off to be Bayesian" in the context of LDA, in the sense that using the posterior predictive density (a T distribution), instead of a plugin approximation (a Gaussian distribution), results in higher classification accuracy, especially when d/n is large. Goal: reproduce the results in his paper. Then make a similar demo for linear and logistic regression. (For the latter, use a Laplace approximation.) Minka00 PMTK .

8 Pedagogy 1 Gen vs discrim, Bayes Lasserre06 gives a Bayesian interpretation of the generative vs discriminative debate. Goal: reproduce the figures in section 3. Lasserre06, Bishop07 . Saeedi and Khabbaz

9 App 2 PCA, GMM, computer vision Lasserre06 discusses how to do object detection using labeled and unlabeled data. Goal: reproduce the results in section 4. (Best done in conjunction with project 8). Lasserre06 . Saeedi and Khabbaz

10 Pedagogy 3 Semisupervised, MCMC Liang07 gives a Bayesian interpretation of semisupervised learning. Goal: reproduce the example figures. Liang07 . Mahendran

11 Pedagogy 2 VBEM Svensen04 describes the VBEM for mixtures of student Ts. Goal: implement and and reproduce his examples. Svensen04 matlab for VBEM for GMM. Michelle Xia

12 Pedagogy 2 VBEM, mixtures of experts Bishop03 describes the VBEM for mixtures of experts. Goal: implement and and reproduce his examples. (EM code for mixexp by David Martin is provided). Bishop03 matlab code for vanilla EM. Catherine Gamroth

13 Pedagogy 2 Mixtures of experts Bo08 describes a different Bayesian method for mixtures of experts. Goal: use his code to reproduce the 2d demos here. Bo08,Sminchisescu07 matlab Khanloo and Chen

14 App 2 Computer vision, Mixtures of experts Apply Bo08 to the 3d tracking problem. Bo08,Sminchisescu07 matlab Khanloo and Chen

15 App 2 (Non-parametric) Regression Energy prediction problem: see Matt Brown's guest lecture (Thur 30 Oct). Matt's slides Some will be provided Lacerda

16 Pedagogy 2 MCMC, Gaussian MRFs, Kalman filter Broet06 describes a mixture model in which the mixture components are spatially correlated via a 1D Gaussian random field. BUGS code is provided, so all you've had to do is re-implement it in Matlab. The key trick is to use block Gibbs sampling rather than single-site Gibbs. You can modify my Kalman filter code for this. Compare to a vanilla HMM. Broet06 their BUGS code, my Kalman filter code. my HMM code. Xi Chen

17A App 1 HMMs, bioinformatics Compare Broet06 to Shah06. (in collaboration with Sohrab Shah at the BC Cancer Agency). Best done in collaboration with project 16. Broet06,Shah06 HMM for aCGH. Nima Aghaeepour

17B App 2 Sparse priors, EM, bioinformatics Pique-Regi08 describes the application of sparse linear models to change point detectiong in aCGH data. Goal: implement the algorithm in Matlab and compare results to their executable. Then compare to our HMM method on real data (in collaboration with Sohrab Shah at the BC Cancer Agency). Pique-Regi08 C, binary Anamaria Crisan and Jing Xiang

18 Pedagogy 2 EM, SVD, optimization Salak08 describes a probabilistic matrix factorization algorithm. Goal: implement the algorithm in Matlab and apply to toy data, comparing to SVD. Salak08 . Will Chao

19 App 3 Large data, EM, SVD, optimization Goal: apply Salak08 to netflix data. Best done in collaboration with project 18. Salak08 . .

20 Pedagogy 3 EM, optimization, GMM, mixtures of factor analysers Salak03 describes a way to speedup EM, and applies to many different models/ data sets. Goal: implement in Matlab and reproduce his figures, and also the movies mixtures of FA and mixtures of Gaussians. Salak03 . .

21 Pedagogy 1 EM On p44 of pml7sep08, fig 3.5b shows the means of each cluster center for binary data. Implement EM for mixtures of Bernoullis (Bishop06 p446) and then regenerate this figure. Plot the marginalized density p(x) = sum_z p(x|z) which will be a blurry image; this is the analog of fig 3.4. Apply mixBernoulli to the mnist digits, and see if it can discover that there are 10 classes (or maybe 11, for the two types of 7s). Try using BIC to choose K, and also try setting K to be large, and initializing the Dirichlet hyper-parameters to be small (Bishop06 p480), to see if it kills off unwanted components. Try clustering binary bag-of-word text data as well; compare your clusters to known class labels. Integrate your code with PMTK. PML, Bishop06 PMTK Benjamin Cecchetto

22 Pedagogy 1 EM Bishop06 p669 describes a mixture of experts model where the mixing indicator is not conditioned on the input. The standard model has a x->z arc. Implement Bishop's simplified model, and compare it to the standard model, using both linear and logistic regression experts. Reproduce his Figs 14.8-14.10. Implement model selection for the number of mixture components, and the number of levels in the hierarchy. Integrate your code with PMTK. Bishop06 matlab code by David Martin. .

23 Pedagogy 1 EM, neural nets A mixture density network is a nonlinear mixture of experts model. It is implemented in netlab. Goal: Compare to regular mixtures of experts on some simple classification and regression problems. Bishop94 netlab Raymond Lim

#	Type	Diff	Methods	Description	Papers	Software	Student
1	Pedagogy	2	Logreg Opt	Holmes03 gives a probabilistic model for k-nearest neighbors, and optimizes the weights by maximizing pseudo likelihood using IRLS. He shows this works better than using CV. Goal: reproduce his results in sec 3.2-3.3.	Holmes03	.	Malekesmaeli and Valizadeh
2	Methods	2	L1 Opt	Holmes03 uses forwards selection to choose k in his probabilistic kNN. Goal: use L1 instead. (Team up with project 1.)	Holmes03	.	Malekesmaeli and Valizadeh
3	Methods	3	EM, Bayes	Schniter08 gives a fast approximate Bayesian ifnerence algorithm (based on tree search) for finding the most probable subsets in a sparse linear regression model. He puts a mixture of Gaussians prior on the weights: mean=std=0 (delta function at 0) if the weight is off, mean=0, std>0 if weight is on (feature is selected). He presents an EM algorithm to learn the hyper-params of the prior. This can be extended to a mixture of K Gaussians, eg. mean=-1 or +1 if weights is on. Goal: extend the empirical Bayes EM algorithm to handle K>2. See if this improves empirical results on his standard test example.	Schniter08	matlab	Anton Likhtarov
4	Methods	3	BIC, linalg	Goal: extend Schniter08 (see project 3) to logistic regression (cf. Hans07). The trick is to devise a fast way to update the BIC objective function.	Schniter08, Hans07	matlab	.
5	Methods	3	Bayes, linalg	Goal: extend Schniter08 (see project 3) so that the prior on weights that are on is a T distribution instead of a mixture of Gaussians. (If the component is off, it's still a delta function at 0.) The trick is to devise a fast way to update the sufficient statistics for a T distribution.	Schniter08	matlab	.
6	Pedagogy	1	Bayes, BIC, CV	In PMLmidterm.pdf Fig 6.3, I plot the loglikelihood vs lambda for ridge regression, as well as the AIC, BIC, CV scores, and use this to select lambda. Goal: Make analogous plots for logistic regression. Also plot the marginal likelihood (midtermbook p131) vs lambda for linear regression. Try comparing Bayesian model selection vs BIC and CV across a range of linear regression data sets.	PML	PMTK	.
7	Pedagogy	1	Bayes, linreg, logreg	In Minka00, he shows that it "pays off to be Bayesian" in the context of LDA, in the sense that using the posterior predictive density (a T distribution), instead of a plugin approximation (a Gaussian distribution), results in higher classification accuracy, especially when d/n is large. Goal: reproduce the results in his paper. Then make a similar demo for linear and logistic regression. (For the latter, use a Laplace approximation.)	Minka00	PMTK	.
8	Pedagogy	1	Gen vs discrim, Bayes	Lasserre06 gives a Bayesian interpretation of the generative vs discriminative debate. Goal: reproduce the figures in section 3.	Lasserre06, Bishop07	.	Saeedi and Khabbaz
9	App	2	PCA, GMM, computer vision	Lasserre06 discusses how to do object detection using labeled and unlabeled data. Goal: reproduce the results in section 4. (Best done in conjunction with project 8).	Lasserre06	.	Saeedi and Khabbaz
10	Pedagogy	3	Semisupervised, MCMC	Liang07 gives a Bayesian interpretation of semisupervised learning. Goal: reproduce the example figures.	Liang07	.	Mahendran
11	Pedagogy	2	VBEM	Svensen04 describes the VBEM for mixtures of student Ts. Goal: implement and and reproduce his examples.	Svensen04	matlab for VBEM for GMM.	Michelle Xia
12	Pedagogy	2	VBEM, mixtures of experts	Bishop03 describes the VBEM for mixtures of experts. Goal: implement and and reproduce his examples. (EM code for mixexp by David Martin is provided).	Bishop03	matlab code for vanilla EM.	Catherine Gamroth
13	Pedagogy	2	Mixtures of experts	Bo08 describes a different Bayesian method for mixtures of experts. Goal: use his code to reproduce the 2d demos here.	Bo08,Sminchisescu07	matlab	Khanloo and Chen
14	App	2	Computer vision, Mixtures of experts	Apply Bo08 to the 3d tracking problem.	Bo08,Sminchisescu07	matlab	Khanloo and Chen
15	App	2	(Non-parametric) Regression	Energy prediction problem: see Matt Brown's guest lecture (Thur 30 Oct).	Matt's slides	Some will be provided	Lacerda
16	Pedagogy	2	MCMC, Gaussian MRFs, Kalman filter	Broet06 describes a mixture model in which the mixture components are spatially correlated via a 1D Gaussian random field. BUGS code is provided, so all you've had to do is re-implement it in Matlab. The key trick is to use block Gibbs sampling rather than single-site Gibbs. You can modify my Kalman filter code for this. Compare to a vanilla HMM.	Broet06	their BUGS code, my Kalman filter code. my HMM code.	Xi Chen
17A	App	1	HMMs, bioinformatics	Compare Broet06 to Shah06. (in collaboration with Sohrab Shah at the BC Cancer Agency). Best done in collaboration with project 16.	Broet06,Shah06	HMM for aCGH.	Nima Aghaeepour
17B	App	2	Sparse priors, EM, bioinformatics	Pique-Regi08 describes the application of sparse linear models to change point detectiong in aCGH data. Goal: implement the algorithm in Matlab and compare results to their executable. Then compare to our HMM method on real data (in collaboration with Sohrab Shah at the BC Cancer Agency).	Pique-Regi08	C, binary	Anamaria Crisan and Jing Xiang
18	Pedagogy	2	EM, SVD, optimization	Salak08 describes a probabilistic matrix factorization algorithm. Goal: implement the algorithm in Matlab and apply to toy data, comparing to SVD.	Salak08	.	Will Chao
19	App	3	Large data, EM, SVD, optimization	Goal: apply Salak08 to netflix data. Best done in collaboration with project 18.	Salak08	.	.
20	Pedagogy	3	EM, optimization, GMM, mixtures of factor analysers	Salak03 describes a way to speedup EM, and applies to many different models/ data sets. Goal: implement in Matlab and reproduce his figures, and also the movies mixtures of FA and mixtures of Gaussians.	Salak03	.	.
21	Pedagogy	1	EM	On p44 of pml7sep08, fig 3.5b shows the means of each cluster center for binary data. Implement EM for mixtures of Bernoullis (Bishop06 p446) and then regenerate this figure. Plot the marginalized density p(x) = sum_z p(x\|z) which will be a blurry image; this is the analog of fig 3.4. Apply mixBernoulli to the mnist digits, and see if it can discover that there are 10 classes (or maybe 11, for the two types of 7s). Try using BIC to choose K, and also try setting K to be large, and initializing the Dirichlet hyper-parameters to be small (Bishop06 p480), to see if it kills off unwanted components. Try clustering binary bag-of-word text data as well; compare your clusters to known class labels. Integrate your code with PMTK.	PML, Bishop06	PMTK	Benjamin Cecchetto
22	Pedagogy	1	EM	Bishop06 p669 describes a mixture of experts model where the mixing indicator is not conditioned on the input. The standard model has a x->z arc. Implement Bishop's simplified model, and compare it to the standard model, using both linear and logistic regression experts. Reproduce his Figs 14.8-14.10. Implement model selection for the number of mixture components, and the number of levels in the hierarchy. Integrate your code with PMTK.	Bishop06	matlab code by David Martin.	.
23	Pedagogy	1	EM, neural nets	A mixture density network is a nonlinear mixture of experts model. It is implemented in netlab. Goal: Compare to regular mixtures of experts on some simple classification and regression problems.	Bishop94	netlab	Raymond Lim

References

@techreport{Bishop94,
    author = "C. M. Bishop",
    title = "Mixture Density Networks",
    number = "NCRG 4288",
    year = "1994",
    institution = "Aston University",
    url = "citeseer.ist.psu.edu/bishop94mixture.html"
}

@inproceedings{Bishop03,
 author = "C. Bishop and M.  Svens\'en",
 year = 2003,
 title = {{Bayesian hierarchical mixtures of experts}},
 booktitle = uai,
 url = "http://research.microsoft.com/~cmbishop/downloads/Bishop-UAI-VHME.pdf"
}

@book{Bishop06,
 author = "C. Bishop",
  title = "Pattern recognition and machine learning",
  year = 2006,
  publisher = "Springer",
}

@inproceedings{Bishop07,
 author = "C. Bishop and J. Lasserre",
 title = "Generative or Discriminative? getting the best of both
worlds",
 booktitle = "Bayesian Statistics 8",
 year = 2007
 url = "http://research.microsoft.com/~cmbishop/downloads/Bishop-Valencia-07.pdf"
}

@INPROCEEDINGS{Bo08,
  author = {L. Bo and C. Sminchisescu and A. Kanaujia and D. Metaxas},
  title = {{Fast Algorithms for Large Scale Conditional 3D Prediction}},
  booktitle = {IEEE International Conference on Computer Vision and Pattern Recognition},
  optpages = {},
  year = {2008},
  pdf = {http://sminchisescu.ins.uni-bonn.de/papers/cvpr08.pdf}
}

@article{Broet06,
 author = "P. Broet and S. Richardson",
 title = {{Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model}},
  year  = 2006,
  journal = "Bioinformatics",
}

@INPROCEEDINGS{Ghahramani94a,
    author = {Zoubin Ghahramani and Michael I. Jordan},
    title = {Supervised learning from incomplete data via an EM approach},
    booktitle = nips,
    year = {1994},
    pages = {120--127}
}

@INPROCEEDINGS{Ghahramani94b,
    author = {Zoubin Ghahramani},
    title = {Solving inverse problems using an EM approach to density estimation},
    booktitle = {Proceedings of the 1993 Connectionist Models Summer School},
    year = {1994},
    pages = {316--323}
}


@article{Hans07,
 title = {{Shotgun Stochastic Search for "Large p" Regression}},
 author = "C Hans and A.  Dobra and M. West",
journal = jasa,
 volume =102,
 number = 478,
 pages = "507--516",
 year = 2007,
 month = "June"
}

@article{Holmes03,
 title = "Likelihood inference in nearest-neighbour classification models",
 author = "C.  Holmes and N. Adams",
 journal = "Biometrika",
 year = 2003,
 volume = 90,
 number = 1,
 pages = "99"
}


@article{Kannan08,
 title = "Fast Transformation-Invariant Component Analysis",
 journal = ijcv,
 volume = 77,
 year = 2008,
 number = 1,
 pages = "87--101"
}

@inproceedings{Lasserre06,
 title = "Principled Hybrids of Generative and Discriminative Models",
 authr = "J. Lasserre and C. Bishop and T. Minka",
 year = 2006, 
 booktitle = cvpr
}

@article{Liang07,
 author = "F. Liang and S. Mukherjee and M. West",
 title = "Understanding the use of
unlabelled data in predictive modelling",
 journal = "Statistical Science",
 volume = 22,
 year = 2007,
 pages = "189-205"
}

@techreport{Minka00,
  author = "T. Minka",
  title = "Inferring a {G}aussian distribution",
  institution = "MIT",
  year = 2000
}

@article{Pique-Regi08,
 title = {{Sparse representation and Bayesian detection of genome copy
number alterations from microarray data}},
 author = "R.  Pique-Regi and J. Monso-Varona and A. Ortega and
R. Seeger and T.  Triche and S. Asgharzadeh",
 journal = "Bioinformatics",
 year = 2008,
 volume = 24,
 number = 3,
 pages = "309-318"
 url = "http://biron.usc.edu/~piquereg/GADA/"
}

@InProceedings{Salak03,
  title =	"Adaptive Overrelaxed Bound Optimization Methods",
  author =	"Ruslan Salakhutdinov and Sam Roweis", 
  booktitle =	"Proceedings of the International Conference on Machine
                 Learning",
  year = 	"2003",
  volume =	"20",
  pages =	"664--671",
 url = "http://www.cs.toronto.edu/~rsalakhu/papers/overrelax.pdf"
}

@InProceedings{Salak08,
author=         "Ruslan Salakhutdinov and Andriy Mnih",
title=          "Probabilistic Matrix Factorization",
booktitle=      "Advances in Neural Information Processing Systems",
volume=         "20",
year=           "2008",
url = "http://www.cs.toronto.edu/~rsalakhu/papers/nips07_pmf.pdf"
}


@article{Schniter08,
 author = "P. Schniter and L. C. Potter and J. Ziniel",
 title = {{Fast Bayesian Matching Pursuit: Model Uncertainty and
Parameter Estimation for Sparse Linear Models}},
 year = 2008,
 journal = "IEEE Trans. on Signal Processing",
 url = "http://www.ece.osu.edu/~zinielj/FBMP_TransSP.pdf"
}

@article{Shah06,
 title = {{Integrating copy number polymorphisms into array CGH analysis using a
robust HMM}},
 author = "S. Shah and  X. Xuang and R. DeLeeuw and M. Khojasteh and
W. Lam and R. Ng and K. Murphy",
journal = "Bioinformatics",
 volume = 22,
 number = 14,
 pages = "e431-e439",
 year = 2006,
 month = "July"
}

@ARTICLE{Sminchisescu07,
  author = {C. Sminchisescu and A. Kanaujia and D. Metaxas},
  title = {{$BM^3E$: Discriminative Density Propagation for Visual Tracking}},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year = {2007},
  pdf = {http://sminchisescu.ins.uni-bonn.de/papers/SKM-pami07.pdf}
}


@article{Svensen04,
 author = "M. Svensen and C. Bishop",
 year = 2004,
 title = "Robust Bayesian mixture modelling",
 journal = "Neurocomputing",
 volume = 64,
 pages = "235--252"
}

@inproceedings{Zivkovic06,
authors="Z.Zivkovic and J.J. Verbeek",
title ="Transformation invariant component analysis for binary images",
booktitle=cvpr,
pages="254-259",
year=2006,
}

21 Pedagogy 1 EM Frey03 describes a transformation invariant mixture of Gaussians model. Zivkovic06 describes a transformation invariant mixtures of PCA for binary data. The goal is to come up a simple example of something in the middle: a transformation invariant mixtures of Bernoullis. Apply this to translated mnist digtis, as in Zivkovic06. Frey03,Zivkovic06 tmgEM.m, tmglPx.m binary PCA.zip . 21 Pedagogy 1 EM, FFT Kannan08 shows how to fit a mixture of Kannan08 matlab . --> 22 Pedagogy 1 EM Ghahramani94 describes how to use mixture models to do supervised learning even when some of the inputs are missing. Goal: implement the algorithm and reproduce his examples (in both papers). Integrate your code with PMTK. Ghahramani94a,b . . -->