- Be a novel result of interest to the machine learning community
(ideally publishable in a conference or workshop)
- Be a novel result of interest to some other community such as
biology or vision; in this case, machine
learning must play a critical role in establishing the result.
- Be a demo (e.g., figures for my book) of use to other people who
teach machine learning. As an example, reproducing the figures of
"On Discriminative vs. Generative Classifiers: A comparison of
logistic regression and Naive Bayes", would be a fine project (except
Matt is already doing it!). More examples are given below.
- Be a software package (e.g., an extension to PMTK) of use to other people within machine learning. As an example, in a previous year, Mayram wrote matbugs, a matlab interface to BUGS, which has proved quite popular to the wider community. (She basically just ported some R code to Matlab, but also made a demo, with Alfred Pang, to illustrate her understanding of Gibbs sampling.)

Please submit a 1 page proposal (pdf) by **Tue 4 Nov**. Either bring hard
copy to class or send me an email.
This should specify what your deliverable will be, why it is
interesting/ useful/ challenging, and cite 2 or 3 relevant papers, and
any other resources (data sets, software pacakages) you need.
Then give a brief timeline of how you will meet your goal in the
alloted time.
If I see a problem with your suggestion, I will let you know by the end
of that week.

You are required to make a 5 minute
presentation on your project. The tentative plan is to meet 10-12.30
on **Thur Dec 4th** in CS 146 (LCI lab meeting room). If you cannot make
that time (eg. if you have an exam), please email me asap.
(29 students, 5 minutes each, plus 5 minute break makes 2.5 hours.)

On or before **Mon 15th December**,
please submit by email a
zip file (Lastname.Firstname.zip)
containing your report (pdf file, 8 pages in
nips format,
optional appendices allowed),
and any accompanying code and data.
For your code, please make sure it is documented (ideally with a web
page, e.g., produced using the "publish" command),
and that it runs on a windows machine, so I
can easily reproduce all your figures.

You will be evaluated on the quality of your work: novelty and importance of ideas, clarity of written report and oral presentation, and speed, readability and ease of use of your code. (The latter is especially important for projects that are designed to be a tutorial/demo for use with PMTK or the book.) Your report/talk should be written in a way that can be understood by other students in the class. The goal is to be precise but concise. Please do not explain basic material that we have covered, instead focus on your novel contribution. If you are applying ML to some problem domain such as biology, please do not spend too long explaining the background to the problem, but again focus on what you did and why it is interesting from an ML point of view.

You can work in groups of up to 2 people. Group projects need to be more ambitious than singleton projects. You will each get the same grade for your project, so you must decide how to split the work between you (e.g., who will present, who will write the report, who will code).

You have 6 weeks to do the project. This is in lieu of homeworks, so you are expected to spend about 50-60 hours on it.

Last | First | Project |
---|---|---|

Aghaeepour | Nima | 17A: compare Broet06 to Shah06. Work with Xi Chen. |

Cecchetto | Benjamin | 21: EM for MixBernoulli |

Chao | WIlliam | 18: probabilistic matrix factorization |

Chen | Xi | 16: re-implement Broet06 in Matlab; work with Aghaeepour. |

Chen | Bo | 14: Apply Sminchisescu's code to some vision data for estimating 3d pose. Joint with Bahman Khanloo. |

Cortes | Adrian | Compare mixtures of Gaussians and Students, using EM and VBEM, on flow cytometry data. Work with Michelle Xia (who is doing project 11 for a different course). |

Crisan | Anamaria | 17B: Reimpement Pique-Regi08 and then compare to Shah06. Joint with: Jing Xiang. |

Duvenaud | David | Compare various conditional density estimators on discrete data, and see if they can be used to predict consequences of novel actions (inputs). |

Gamroth | Catherine | 12: VBEM for hierarchical mixture of epxerts (Bishop03). |

Goya | Rodrigo | Sparse logistic regression with Lp penalty with p<1 |

Goyal | Amit | Determining influence in social networks |

Gupta | Ankur | Learning depth from single monocular images |

Hall | Joseph | Predict 2d time series of human arousal and valence given EKG, EMG etc data using various off-the-shelf methods. |

Khabaz | Mohammad | 8,9: Minka's hybrid generative and discriminative method, applied to object detection. |

Khanloo | Bahman | See Chen, Bo. |

Kini | Pranab | Predict which file will be accessed next given a transaction log and knowledge of the directory structure. |

Lacerda | Gustavo | 15: predict next 48 hours of energy use given history of use (ignoring covariates). |

Likhtarov | Anton | 3: FBMP with Q>2 mixture of Gaussians prior. |

Lim | Raymond | 23: Compare mixture density networks (in netlab) to mixture of experts (using Martin's code). |

Mahendran | Nimilan | 10: Bayesian interpretation of semisupervised learning |

Malekesmaeli | Mani | 1,2: Implement Holmes's probabilistic kNN method and modify to use L1 instead of forwards selection. Joint with Valizadeh. |

Nguyen | Thanh | Predicting people's restaurant ratings (along multiple aspects) from review text (features already available). |

Ning | Kaida | Clustering data where features have an unknown sign. |

Saeedi | Noushin | See Khabbaz. |

Valizadeh | Amir | See Malekesmaeli. |

Xiang | Jing | See Crisan. |

- methods: novel ML methodology, will require mathematical derivation and implementation, could result in a paper.
- pedagogy: reproducing existing results, possibly on synthetic data, so you can learn something, and make useful demos/ figures for the book/ PMTK, to help other people learn in the future.
- app: application to some kind of data. Simply downloading software off the web and applying it is not sufficient for a project. However, you can do do some kind of comparative experiment of many methods. Alternatively, you can team up with someone doing a pedagogy project, where you apply the method to real data (which often involves substantial work).

I have also listed methods that I think each project may need. Some acronyms you may not (yet!) know: EM = expectation maximization, VB EM = variational Bayes EM, MCMC = Markov chain Monte Carlo, Opt = optimization, LinAlg = linear algebra, etc.

In the table below, I have rated each project on a scale of 1-3, where 1 = easy and 3 = hard. This is very subjective; I call a project hard if you have to derive new math or algorithms by yourself; I call it medium if you just have to implement someone else's (possibly tricky) methods; and I call it easy if you just have to implement something simple. Harder projects can be split between 2 people. (Note that it is better to do an easy project well than a hard project poorly; but you will be rewarded for tackling something hard.)

# | Type | Diff | Methods | Description | Papers | Software | Student |
---|---|---|---|---|---|---|---|

1 | Pedagogy | 2 | Logreg Opt | Holmes03 gives a probabilistic model for k-nearest neighbors, and optimizes the weights by maximizing pseudo likelihood using IRLS. He shows this works better than using CV. Goal: reproduce his results in sec 3.2-3.3. | Holmes03 | . | Malekesmaeli and Valizadeh |

2 | Methods | 2 | L1 Opt | Holmes03 uses forwards selection to choose k in his probabilistic kNN. Goal: use L1 instead. (Team up with project 1.) | Holmes03 | . | Malekesmaeli and Valizadeh |

3 | Methods | 3 | EM, Bayes | Schniter08 gives a fast approximate Bayesian ifnerence algorithm (based on tree search) for finding the most probable subsets in a sparse linear regression model. He puts a mixture of Gaussians prior on the weights: mean=std=0 (delta function at 0) if the weight is off, mean=0, std>0 if weight is on (feature is selected). He presents an EM algorithm to learn the hyper-params of the prior. This can be extended to a mixture of K Gaussians, eg. mean=-1 or +1 if weights is on. Goal: extend the empirical Bayes EM algorithm to handle K>2. See if this improves empirical results on his standard test example. | Schniter08 | matlab | Anton Likhtarov |

4 | Methods | 3 | BIC, linalg | Goal: extend Schniter08 (see project 3) to logistic regression (cf. Hans07). The trick is to devise a fast way to update the BIC objective function. | Schniter08, Hans07 | matlab | . |

5 | Methods | 3 | Bayes, linalg | Goal: extend Schniter08 (see project 3) so that the prior on weights that are on is a T distribution instead of a mixture of Gaussians. (If the component is off, it's still a delta function at 0.) The trick is to devise a fast way to update the sufficient statistics for a T distribution. | Schniter08 | matlab | . |

6 | Pedagogy | 1 | Bayes, BIC, CV | In PMLmidterm.pdf Fig 6.3, I plot the loglikelihood vs lambda for ridge regression, as well as the AIC, BIC, CV scores, and use this to select lambda. Goal: Make analogous plots for logistic regression. Also plot the marginal likelihood (midtermbook p131) vs lambda for linear regression. Try comparing Bayesian model selection vs BIC and CV across a range of linear regression data sets. | PML | PMTK | . |

7 | Pedagogy | 1 | Bayes, linreg, logreg | In Minka00, he shows that it "pays off to be Bayesian" in the context of LDA, in the sense that using the posterior predictive density (a T distribution), instead of a plugin approximation (a Gaussian distribution), results in higher classification accuracy, especially when d/n is large. Goal: reproduce the results in his paper. Then make a similar demo for linear and logistic regression. (For the latter, use a Laplace approximation.) | Minka00 | PMTK | . |

8 | Pedagogy | 1 | Gen vs discrim, Bayes | Lasserre06 gives a Bayesian interpretation of the generative vs discriminative debate. Goal: reproduce the figures in section 3. | Lasserre06, Bishop07 | . | Saeedi and Khabbaz |

9 | App | 2 | PCA, GMM, computer vision | Lasserre06 discusses how to do object detection using labeled and unlabeled data. Goal: reproduce the results in section 4. (Best done in conjunction with project 8). | Lasserre06 | . | Saeedi and Khabbaz |

10 | Pedagogy | 3 | Semisupervised, MCMC | Liang07 gives a Bayesian interpretation of semisupervised learning. Goal: reproduce the example figures. | Liang07 | . | Mahendran |

11 | Pedagogy | 2 | VBEM | Svensen04 describes the VBEM for mixtures of student Ts. Goal: implement and and reproduce his examples. | Svensen04 | matlab for VBEM for GMM. | Michelle Xia |

12 | Pedagogy | 2 | VBEM, mixtures of experts | Bishop03 describes the VBEM for mixtures of experts. Goal: implement and and reproduce his examples. (EM code for mixexp by David Martin is provided). | Bishop03 | matlab code for vanilla EM. | Catherine Gamroth |

13 | Pedagogy | 2 | Mixtures of experts | Bo08 describes a different Bayesian method for mixtures of experts. Goal: use his code to reproduce the 2d demos here. | Bo08,Sminchisescu07 | matlab | Khanloo and Chen |

14 | App | 2 | Computer vision, Mixtures of experts | Apply Bo08 to the 3d tracking problem. | Bo08,Sminchisescu07 | matlab | Khanloo and Chen |

15 | App | 2 | (Non-parametric) Regression | Energy prediction problem: see Matt Brown's guest lecture (Thur 30 Oct). | Matt's slides | Some will be provided | Lacerda |

16 | Pedagogy | 2 | MCMC, Gaussian MRFs, Kalman filter | Broet06 describes a mixture model in which the mixture components are spatially correlated via a 1D Gaussian random field. BUGS code is provided, so all you've had to do is re-implement it in Matlab. The key trick is to use block Gibbs sampling rather than single-site Gibbs. You can modify my Kalman filter code for this. Compare to a vanilla HMM. | Broet06 | their BUGS code, my Kalman filter code. my HMM code. | Xi Chen |

17A | App | 1 | HMMs, bioinformatics | Compare Broet06 to Shah06. (in collaboration with Sohrab Shah at the BC Cancer Agency). Best done in collaboration with project 16. | Broet06,Shah06 | HMM for aCGH. | Nima Aghaeepour |

17B | App | 2 | Sparse priors, EM, bioinformatics | Pique-Regi08 describes the application of sparse linear models to change point detectiong in aCGH data. Goal: implement the algorithm in Matlab and compare results to their executable. Then compare to our HMM method on real data (in collaboration with Sohrab Shah at the BC Cancer Agency). | Pique-Regi08 | C, binary | Anamaria Crisan and Jing Xiang |

18 | Pedagogy | 2 | EM, SVD, optimization | Salak08 describes a probabilistic matrix factorization algorithm. Goal: implement the algorithm in Matlab and apply to toy data, comparing to SVD. | Salak08 | . | Will Chao |

19 | App | 3 | Large data, EM, SVD, optimization | Goal: apply Salak08 to netflix data. Best done in collaboration with project 18. | Salak08 | . | . |

20 | Pedagogy | 3 | EM, optimization, GMM, mixtures of factor analysers | Salak03 describes a way to speedup EM, and applies to many different models/ data sets. Goal: implement in Matlab and reproduce his figures, and also the movies mixtures of FA and mixtures of Gaussians. | Salak03 | . | . |

21 | Pedagogy | 1 | EM | On p44 of pml7sep08, fig 3.5b shows the means of each cluster center for binary data. Implement EM for mixtures of Bernoullis (Bishop06 p446) and then regenerate this figure. Plot the marginalized density p(x) = sum_z p(x|z) which will be a blurry image; this is the analog of fig 3.4. Apply mixBernoulli to the mnist digits, and see if it can discover that there are 10 classes (or maybe 11, for the two types of 7s). Try using BIC to choose K, and also try setting K to be large, and initializing the Dirichlet hyper-parameters to be small (Bishop06 p480), to see if it kills off unwanted components. Try clustering binary bag-of-word text data as well; compare your clusters to known class labels. Integrate your code with PMTK. | PML, Bishop06 | PMTK | Benjamin Cecchetto |

22 | Pedagogy | 1 | EM | Bishop06 p669 describes a mixture of experts model where the mixing indicator is not conditioned on the input. The standard model has a x->z arc. Implement Bishop's simplified model, and compare it to the standard model, using both linear and logistic regression experts. Reproduce his Figs 14.8-14.10. Implement model selection for the number of mixture components, and the number of levels in the hierarchy. Integrate your code with PMTK. | Bishop06 | matlab code by David Martin. | . |

23 | Pedagogy | 1 | EM, neural nets | A mixture density network is a nonlinear mixture of experts model. It is implemented in netlab. Goal: Compare to regular mixtures of experts on some simple classification and regression problems. | Bishop94 | netlab | Raymond Lim |

@techreport{Bishop94, author = "C. M. Bishop", title = "Mixture Density Networks", number = "NCRG 4288", year = "1994", institution = "Aston University", url = "citeseer.ist.psu.edu/bishop94mixture.html" } @inproceedings{Bishop03, author = "C. Bishop and M. Svens\'en", year = 2003, title = {{Bayesian hierarchical mixtures of experts}}, booktitle = uai, url = "http://research.microsoft.com/~cmbishop/downloads/Bishop-UAI-VHME.pdf" } @book{Bishop06, author = "C. Bishop", title = "Pattern recognition and machine learning", year = 2006, publisher = "Springer", } @inproceedings{Bishop07, author = "C. Bishop and J. Lasserre", title = "Generative or Discriminative? getting the best of both worlds", booktitle = "Bayesian Statistics 8", year = 2007 url = "http://research.microsoft.com/~cmbishop/downloads/Bishop-Valencia-07.pdf" } @INPROCEEDINGS{Bo08, author = {L. Bo and C. Sminchisescu and A. Kanaujia and D. Metaxas}, title = {{Fast Algorithms for Large Scale Conditional 3D Prediction}}, booktitle = {IEEE International Conference on Computer Vision and Pattern Recognition}, optpages = {}, year = {2008}, pdf = {http://sminchisescu.ins.uni-bonn.de/papers/cvpr08.pdf} } @article{Broet06, author = "P. Broet and S. Richardson", title = {{Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model}}, year = 2006, journal = "Bioinformatics", } @INPROCEEDINGS{Ghahramani94a, author = {Zoubin Ghahramani and Michael I. Jordan}, title = {Supervised learning from incomplete data via an EM approach}, booktitle = nips, year = {1994}, pages = {120--127} } @INPROCEEDINGS{Ghahramani94b, author = {Zoubin Ghahramani}, title = {Solving inverse problems using an EM approach to density estimation}, booktitle = {Proceedings of the 1993 Connectionist Models Summer School}, year = {1994}, pages = {316--323} } @article{Hans07, title = {{Shotgun Stochastic Search for "Large p" Regression}}, author = "C Hans and A. Dobra and M. West", journal = jasa, volume =102, number = 478, pages = "507--516", year = 2007, month = "June" } @article{Holmes03, title = "Likelihood inference in nearest-neighbour classification models", author = "C. Holmes and N. Adams", journal = "Biometrika", year = 2003, volume = 90, number = 1, pages = "99" } @article{Kannan08, title = "Fast Transformation-Invariant Component Analysis", journal = ijcv, volume = 77, year = 2008, number = 1, pages = "87--101" } @inproceedings{Lasserre06, title = "Principled Hybrids of Generative and Discriminative Models", authr = "J. Lasserre and C. Bishop and T. Minka", year = 2006, booktitle = cvpr } @article{Liang07, author = "F. Liang and S. Mukherjee and M. West", title = "Understanding the use of unlabelled data in predictive modelling", journal = "Statistical Science", volume = 22, year = 2007, pages = "189-205" } @techreport{Minka00, author = "T. Minka", title = "Inferring a {G}aussian distribution", institution = "MIT", year = 2000 } @article{Pique-Regi08, title = {{Sparse representation and Bayesian detection of genome copy number alterations from microarray data}}, author = "R. Pique-Regi and J. Monso-Varona and A. Ortega and R. Seeger and T. Triche and S. Asgharzadeh", journal = "Bioinformatics", year = 2008, volume = 24, number = 3, pages = "309-318" url = "http://biron.usc.edu/~piquereg/GADA/" } @InProceedings{Salak03, title = "Adaptive Overrelaxed Bound Optimization Methods", author = "Ruslan Salakhutdinov and Sam Roweis", booktitle = "Proceedings of the International Conference on Machine Learning", year = "2003", volume = "20", pages = "664--671", url = "http://www.cs.toronto.edu/~rsalakhu/papers/overrelax.pdf" } @InProceedings{Salak08, author= "Ruslan Salakhutdinov and Andriy Mnih", title= "Probabilistic Matrix Factorization", booktitle= "Advances in Neural Information Processing Systems", volume= "20", year= "2008", url = "http://www.cs.toronto.edu/~rsalakhu/papers/nips07_pmf.pdf" } @article{Schniter08, author = "P. Schniter and L. C. Potter and J. Ziniel", title = {{Fast Bayesian Matching Pursuit: Model Uncertainty and Parameter Estimation for Sparse Linear Models}}, year = 2008, journal = "IEEE Trans. on Signal Processing", url = "http://www.ece.osu.edu/~zinielj/FBMP_TransSP.pdf" } @article{Shah06, title = {{Integrating copy number polymorphisms into array CGH analysis using a robust HMM}}, author = "S. Shah and X. Xuang and R. DeLeeuw and M. Khojasteh and W. Lam and R. Ng and K. Murphy", journal = "Bioinformatics", volume = 22, number = 14, pages = "e431-e439", year = 2006, month = "July" } @ARTICLE{Sminchisescu07, author = {C. Sminchisescu and A. Kanaujia and D. Metaxas}, title = {{$BM^3E$: Discriminative Density Propagation for Visual Tracking}}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, year = {2007}, pdf = {http://sminchisescu.ins.uni-bonn.de/papers/SKM-pami07.pdf} } @article{Svensen04, author = "M. Svensen and C. Bishop", year = 2004, title = "Robust Bayesian mixture modelling", journal = "Neurocomputing", volume = 64, pages = "235--252" } @inproceedings{Zivkovic06, authors="Z.Zivkovic and J.J. Verbeek", title ="Transformation invariant component analysis for binary images", booktitle=cvpr, pages="254-259", year=2006, }21 Pedagogy 1 EM Frey03 describes a transformation invariant mixture of Gaussians model. Zivkovic06 describes a transformation invariant mixtures of PCA for binary data. The goal is to come up a simple example of something in the middle: a transformation invariant mixtures of Bernoullis. Apply this to translated mnist digtis, as in Zivkovic06. Frey03,Zivkovic06 tmgEM.m, tmglPx.m binary PCA.zip . 21 Pedagogy 1 EM, FFT Kannan08 shows how to fit a mixture of Kannan08 matlab . --> 22 Pedagogy 1 EM Ghahramani94 describes how to use mixture models to do supervised learning even when some of the inputs are missing. Goal: implement the algorithm and reproduce his examples (in both papers). Integrate your code with PMTK. Ghahramani94a,b . . -->