CPSC 340 - Machine Learning and Data Mining (Fall 2016)

Lectures: Mondays, Wednesdays, and Fridays (2-3 in West Mall Swing Space 122) beginning September 7

Tutorials: Mondays from 4-5 (MacLeod 214) and 5-6 (DMP 101), Tuesdays from 4:30-5:30 (DMP 201), and Wednesdays from 9-10 (CBEB 103) beginning September 12.

Office hours: Tuesdays at 2-3 (ICICS 104) and 3:30-4:30 (DLC Table 4), Wednesdays 4-5 (ICICS X337), Thursdays 4:30-5:30 (ICICS X836), or by appointment.

Instructor: Mark Schmidt

Teaching Assistants: Reza Babanezhad, Ricky Chen, Issam Laradji, Robbie Rolin, Alireza Shafaei, Moumita Roy Tora, Nasim Zolaktaf, Zainab Zolaktaf

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.


Since multivariate calculus is a new prerequisite, for the 2016-17 year only we are allowing MATH 200 (equivalent) to be taken as a co-requisite provided that the average of the other MATH/STAT prerequisites is at least 76%. Other courses that are helpful but not required include scientific computing (CPSC 302), algorithms and complexity (CPSC 320), and statistical inference (STAT 305).

Registration: Undergraduate and graduate students from any department are welcome to take the class, provided that they satisfy the prerequisites. If you do not satisfy the exact prerequisites but would still like to enroll in the class, there are additional details available here and here.

The general seats available in this class usually fill up very quickly. Because of this, we have reserved a small number of restricted seats for CPSC graduate students. These seats will turn into general seats at the end of the first week of class.

Once the general seats are taken, the only way to register for the course is to sign up for the waiting list. You should sign up for the waiting list even if it is long; last year we were able to accommodate all students on the waiting list. Signing up for the waiting list also makes it more likely that we will open up extra sessions, expand class sizes, or offer additional courses on these topics. You may also want to consider taking related courses from statistics: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Grading: Assignments 25%, Midterm 30%, Final 45%.

Piazza for course-related questions.

List of topics

We will roughly cover the following topics:


Date Topic Related Readings and Links Homework and Notes
Wed Sep 7
Syllabus Machine Learning Rise of the Machines Talking Machine Episode 1
Fri Sep 9
Data Exploration Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Matlab demos Other Tools
Mon Sep 12
Decision Trees A Visual Introduction to Machine Learning, Decision Trees
Entropy What make Dr. Seuss so silly?
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2
Assignment 1 a1.zip
Notes on big-O
Getting Started with Matlab
Wed Sep 14
Learning Theory IID Cross-validation Bias-variance No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Tutorial 1
Matlab Commands
Fri Sep 16
Generative Models Conditional probability (demo) Naive Bayes Probabilities and Battleship
ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2
Notes on probability
Mon Sep 19
Non-Parametric Models K-nearest neighbours Decision Theory for Darts
AI: AMA 18.8, ESL 13.3, ML:APP 1.4
Wed Sep 21
Ensemble Methods Ensemble Methods Random Forests Empirical Study Kinect
AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
Tutorial 2 t2.zip
Fri Sep 23
Clustering Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Assignment 1 due
Mon Sep 26
Density-based Clustering DBSCAN (video, demo) Norms
IDM 8.4
Assignment 2 a2.zip
Wed Sep 28
Hierarchical Clustering Hierarchical Clustering Phylogenetic Trees
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Tutorial 3
Fri Sep 30
Outlier Detection Survey and Empirical Study
IDM 10.1-5
Mon Oct 3
Association Rules Association Rule Learning Apriori Amazon Product Recommendation
IDM 6.1-6.3, ESL 14.2
Wed Oct 5
Linear Regression Linear Regression (demo, 2D data, 2D video) Least Squares
Partial Derivatives Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6
Tutorial 4
Notes on Linear Algebra
Fri Oct 7
Non-Linear Regression Fluid Simulation
ESL 5.1, 6.3, and 6.7
Assignment 2 due
Linear/Quadratic Gradients
Wed Oct 12
Regularization RBF video RBF and Regularization video
ESL 3.4, ML:APP 7.5, AI:AMA 18.4
Assignment 3 a3.zip
Tutorial 5
Fri Oct 14
Gradient Descent Gradient Descent
ML:APP 7.4
Mon Oct 17
Logistic Regression Gmail Priority Inbox
ESL 4.4, ML:APP 8.1-3, AI:AMA 18.9
Wed Oct 19
Support Vector Machines Support Vector Machines
ESL 4.5 and 12.1-2, ML:APP 14.5
Assignment 3 due
Tutorial 6
Fri Oct 21
Kernel Methods ESL 12.3, ML:APP 14.1-4
Mon Oct 24
Stochastic Gradient Stochastic Gradient
ML:APP 8.5
Wed Oct 26
Feature Selection ESL 3.3
Fri Oct 28
Mon Oct 31
L1-Regularization Maximum Likelihood Estimation
ESL 3.4, ML:APP 13.3-4
Assignment 4 a4.zip
Wed Nov 2
Multi-Class Regression ML:APP 8.3.7 and 9.3-5, ESL 4.4 Tutorial 8
Fri Nov 4
Principal Component Analysis Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
Mon Nov 7
More PCA SVD Eigenfaces
Wed Nov 9
Sparse Matrix Factorization Non-Negative Matrix Factorization
ESL 14.6, ML: APP 13.8
Tutorial 9
Mon Nov 14
Recommender Systems Recommender Systems Netflix Prize Assignment 5 a5.zip
Assignment 4 due
Wed Nov 16
Multi-Dimensional Scaling Nonlinear Dimensionality Reduction
ESL 14.8-9, IDM B.2
Tutorial 10
Fri Nov 18
Neural Networks Google Video Fortune Article
ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7
Assignment 6 a6.zip
Mon Nov 21
Deep Learning Web book
ML:APP 28.3, ESL 11.5
Wed Nov 23
Convolutional Neural Networks Convolutional Neural Networks AlexNet
ML:APP 28.4, ESL 11.7
Tutorial 11
Fri Nov 25
More CNNs Assignment 5 due
Mon Nov 28
Ranking PageRank Slides, PageRank math/code
ESL 14.10, ML:APP 9.7, AI: AMA 22.3
Wed Nov 30
Semi-Supervised Learning Semi-Supervised Learning Label Propagation at Google Tutorial 12
Fri Dec 2
Course Review/Preview Assignment 6 due

Related courses that have online notes

Mark Schmidt > Courses > CPSC 340