CS340 Machine learning and Data Mining - 2010 Winter

Lectures MWF 4.00-5.00, Dempster 301
Calendar entry

Prerequisites: Linear algebra, calculus, probability theory, programming (Matlab).

Tutorial T2A F 3.00-4.00, Dempster 101 
Tutorial T2B M 11.00-12.00, Dempster 101
Instructor: Arnaud Doucet. Office hours: Monday 5.00-6.000. 

TAs: Marcos Ginestra ginestra@cs.ubc.ca, Paul Vanetti  pvanetti@cs.ubc.ca
Office hours: Wednesday 11-noon (Paul) and Thursday 2.00-3.00 (Marcos), Demco learning center

Online Discussion: cs340ubc2010 google group    Please join the group as we will use it for class-related announcements and discussions.

Textbook: Draft copies of the textbook by Kevin Murphy, Machine Learning: a probabilistic approach (MLAPA). They will be make available for purchase for $XX from Copiesmart in the UBC Village (next to Macdonald's).
You do not need to buy them but we will also use Bayesian Reasoning and Machine Learning by David Barber, Pattern Recognition and Machine Learning by Chris Bishop ,The Elements of Statistical Learning by Hastie, Tibshirani and Friedman (although are more advanced than the level of this course).

Tentative grading policy:  Midterm 25%, Assignment 25%, Final 50%

Missed homework/exam policy

Learning objectives of course

Assignments: Assignments will involve both written and Matlab programming problems. All assignments are due on the specified date by 4pm. 20% off for each day late. Assignments will not be accepted after 5 days late.

Basics

Programming language: The programming language of the course is Matlab. I strongly recommend you follow this link and become familiar with Matlab.
Maths : If you do not feel comfortable with calculus, linear algebra and probability then please do read the following material

* Cribsheet
* Linear algebra: A review Another review
* Probability: Probability theory refresher, another review

News

* Exercises

* Solutions of Q2 & Q3 for HW5

Syllabus/Timetable

L# Date Slides Reading Homework
L1 Wed Jan 5
Introduction Optional: .
L2 Fri Jan 7
Introduction to classification
.
L3 Mon Jan 10
Introduction to classification
    Tutorial Matlab slides.pdf 
    L4 Wed Jan 12
    K Nearest Neighbors
    HW1.pdf    Data
    L5 Fri Jan 14
    K Nearest Neighbors (cont.) Read Sections 1.2.4, 1.2.5 and 1.8.5.1. .
    L6 Mon Jan 17
    Principal Component Analysis Read Section 31.1 to 31.4 and 31.7.1 (linear algebra) and Section 21.3.3. (PCA)
    .
    L7 Wed Jan 19
    Principal Component Analysis (cont)
    .
    L8 Fri Jan 21
    Principal Component Analysis (cont)
    .
    L9 Mon Jan 24
    Principal Component Analysis and SVD  (cont) . HW2.pdf  twofours.mat  matrix.dat literals.dat documents.pdf

    Tutorial slides.pdf

    Solutions of HW2: Q1 (pcavisual.m), Q2 (face2.jpg pca.m  q2.m proof.pdf), Q3 (q3.m), Q4 (q4.pdf)
    L10 Wed Jan 26
    Probability Refresher
    Read Sections 2.1 to 2.7
    .
    L11 Fri Jan 28
    Google's PageRank 
    Read Sections 2.8 and 30.7
    Optional reading: Very informal introduction to PageRank
    Optional reading: The $25,000,000,000 Eigenvector - The Linear Algebra Behind Google
    Tutorial  code.zip
    L12 Mon Jan 31
    Google's PageRank

    L13 Wed Feb 2
    Google's PageRank PageRank Code: surfer.m   pagerank.m   HW3.pdf  adjency.mat
    Solutions of HW3: Q1 (solution, matlab), Q2 (solution), Q3 (solution) , Q4 (solution, matlab)
    L14 Fri Feb 4
    Naive Bayes Classifiers  
    Read Sections 1.4.3, 1.4.4, 1.4.5, 1.4.6 .
    L15 Mon Feb 7
    Maximum Likelihood    Read Sections 3.1. to 3.2.4 .
    L16 Wed Feb 9
    ML and Bayesian Statistics
    Read Sections 4.1 to 4.5 .
    L17 Fri Feb 11
    Midterm
    Midterm and Midterm solutions .
    L18 Mon Feb 21 Bayesian Statistics
    Read Sections 4.1 to 4.5
    L19 Wed Feb 23 Bayesian Statistics
    Read Section 4.6, 4.8 and 4.9 .
    L20 Fri Feb 25 More Bayes statistics and Linear Regression  
    Read Section 1.3 .
    L21 Mon Feb 28 Linear Regression  (Least squares and Nonlinear models) Read Section 1.3 and 1.7.1 to 1.7.3. .
    L22 Wed Mar  2 Linear Regression  (Nonlinear models and Probabilistic Interpretation) Read Section 1.3 and 1.7.1 to 1.7.3. HW4.pdf   nursery.mat   motor.mat
    Solutions of HW4: Q1 (compressed tar file)  Q2
    L23 Fri Mar 4 Linear Regression (Robust regression) Read Sections 1.7.4 to 1.8.5 .
    L24 Mon Mar 7 Linear Regression (Ridge and Lasso regression)
    Read Sections 1.7.4 to 1.8.5
    L25 Wed Mar 9 Logistic Regression 
    Read Sections 1.2.7 to 1.2.12 and 11.1 to 11.2 Marco notes tutorial
    L26 Fri Mar 11 Logistic Regression .
    L27 Mon Mar 14 Logistic Regression
    Read Sections 16.1 to 16.3

    Marco notes tutorial logistic regression
    L28 Wed Mar 16
    Neural Networks Additional reading:  M. Titterington, Bayesian methods for neural networks and related models, Stat. Science, 2004. .
    L29 Fri Mar 18 Neural Networks 
    .
    L30 Mon Mar 21
    Multivariate Gaussian Distributions and Discriminant Analysis Read Sections 5.1 to 5.3  and Section 1.4.1. HW5.pdf  spamdata.mat
    Solutions of HW5: Q2 Q3
    L31 Wed Mar 23
    Multivariate Gaussian Distributions and Discriminant Analysis  .
    L32 Fri Mar 25
    Unsupervised Learning: K-Means  Read Section 20.2.1 to 20.2.3 .
    L33 Mon Mar 28
    Unsupervised Learning: Finite Mixture Models and EM Algorithm Read Section 1.5.1 and Section 11.4
    L34 Wed Mar 30
    Unsupervised Learning: Finite Mixture Models and EM Algorithm . .
    L35 Fri Apr 1
    Hidden Markov Models (draft)
    Read Section  6.3.5, 23.1 and 23.2 .
    L36 Mon Apr 4
    Hidden Markov Models
    .
    L37 Wed Apr 6
    Review . .
    Final Wed Apr 27 Final exam in DPM 310 at 8.30am