CPSC 340 and 532M - Machine Learning and Data Mining (Fall 2019)

Lectures Sections (beginning September 4): Instructor: Mark Schmidt
Instructor office hours will start at the end of classes Wednesdays, and then move to the following location: *Will be held in ICICS 193 on Sep 4, Oct 16, and Nov 13.
**Will be held in ICICS 193 on Sep 11 and Sep 25, and cancelled on October 2.

Tutorials (beginning September 9):

Teaching assistants: Sarah Elhammadi, Dylan Green, Nam Hee Kim (Head TA), Frederik Kunstner, Ke (Mark) Ma, Lironne Kurzman, Benjamin Paul-Dubois-Taine, Michael Przystupa, Shahriar Shayesteh, Betty Shea, Karl Slakov, Yihan (Joey) Zhou
TA office hours (all in Demco Learning Centre):

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.

Registration: Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 532M (which has an extra small project component). Below are more details on registration for each course:

Starting in the second week of classes, we'll have weekly tutorials run by the TAs. These will do things like go through provided assignment code, review background material, review big concepts, and/or do exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section.


***Students who completed STAT 200, STAT 203, PSYC 218, PSYC 278, BIOL 300 or COMM 291 with a grade of at least 72%, prior to 2019W1 will be allowed to take CPSC 340 in 2019W, provided they have all of the other stated prerequisites. Students who do not meet these requirements should consider taking CPSC 330, a new course on applied machine learning.

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses: The most related course is CPSC 330: Applied Machine Learning. This course has fewer prerequisities and covers some of the same material, but focuses more on applications rather than understanding ML ideas in depth.

Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.


Piazza for course-related questions.

List of topics

We will roughly cover the following topics:

Lectures, Assignments, Related Readings, and Links

Date Slides Related Readings and Links Homework and Notes
Wed Sep 4 Motivation and Syllabus What is Machine Learning? Machine Learning
Rise of the Machines Talking Machine Episode 1
Fri Sep 6 Exploratory Data Analysis Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Other Tools
Assignment 1 a1.zip a1.tex
Mon Sep 9 Decision Trees A Visual Introduction to Machine Learning, Decision Trees Entropy
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Julia Commands
Wed Sep 11 Fundamentals of Learning 7 Steps of Machine Learning IID Cross-validation Bias-variance No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Course Notation Guide
Fri Sep 13 Probabilistic Classifiers Conditional probability (demo) Naive Bayes Probabilities and Battleship
ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2
Assignment 1 due
Probability Notes Probability Slides
Mon Sep 16 Non-Parametric Models K-nearest neighbours Decision Theory for Darts Norms
AI: AMA 18.8, ESL 13.3, ML:APP 1.4
Assignment 2 a2.zip a2.tex
Wed Sep 18 Ensemble Methods Ensemble Methods Random Forests Empirical Study Kinect
AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
Fri Sep 20 Clustering Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Mon Sep 23
More Clustering DBSCAN (video, demo) Hierarchical Clustering Phylogenetic Trees
IDM 8.4
Wed Sep 25
Outlier Detection Empirical Study
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Fri Sep 27
Finding Similar Items
(Bonus Lecture)
MMD Chapter 3 Assignment 2 due
Mon Sep 30
Least Squares Linear Regression (demo, 2D data, 2D video) Least Squares Essence of Calculus Partial Derivative Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6
Assignment 3 a3.zip a3.tex
Calculus Notes
Wed Oct 2
Nonlinear Regression Why should one learn machine learning from scratch? Essence of Linear Algebra Matrix Differentiation Fluid Simulation (video)
ESL 5.1, 6.3
Linear Algebra Notes
Linear/Quadratic Gradients
Fri Oct 4
Gradient Descent Gradient Descent Convex Functions
Mon Oct 7
Robust Regression ML:APP 7.4
Wed Oct 9
Feature Selection Genome-Wide Association Studies AIC, BIC
ESL 3.3 , 7.5-7
Fri Oct 11
Regularization ESL 3.4., ML:APP 7.5, AI:AMA 18.4 Assignment 3 due
Wed Oct 16
More Regularization RBF video RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
Thu Oct 17
Fri Oct 18
Linear Classifiers Perceptron
ESL 4.5, ML:APP 8.5
Mon Oct 21
More Linear Classifiers Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 18.9
Assignment 4 a4.zip a4.tex
Wed Oct 23
Feature Engineering Gmail Priority Inbox
Fri Oct 25
Mon Oct 28
Kernel Trick ESL 12.3, ML:APP 14.1-4
Wed Oct 30
Stochastic Gradient Stochastic Gradient
ML:APP 8.5
Fri Nov 1
Boosting AdaBoost (video) XGBoost (video)
ML:APP 16.4
Assignment 4 due
Max and Argmax Notes
Mon Nov 4
MLE and MAP Maximum Likelihood Estimation
ML:APP 9.3-4
Assignment 5 a5.zip a5.tex
Wed Nov 6
Principal Component Analysis Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
Fri Nov 8
More PCA Making Sense of PCA SVD Eigenfaces
Wed Nov 13
Sparse Matrix Factorization Non-Negative Matrix Factorization (original - access from UBC)
ESL 14.6, ML: APP 13.8
Fri Nov 15
Recommender Systems Recommender Systems Netflix Prize Assignment 5 due
Mon Nov 18
Multi-Dimensional Scaling Nonlinear Dimensionality Reduction t-SNE demo
ESL 14.8-9, IDM B.2
Assignment 6 a6.zip a6.tex
Wed Nov 20
Deep Learning Google Video What is a Neural Network? Interactive Guide
ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7
Fri Nov 22
More Deep Learning Fortune Article Deep Learning References Alchemy
ML:APP 28.3, ESL 11.5
Mon Nov 25
Convolutional Neural Networks Convolutional Neural Networks
ML:APP 28.4, ESL 11.7
Wed Nov 27
2pm: Semi-Supervised Learning (CHEM D300)
4pm: More CNNs (MCML 166)
Semi-Supervised Learning Label Propagation at Google
Fri Nov 29
2pm: PageRank (CHEM D300)
4pm: Number of Iterations (MCML 166)
PageRank Slides PageRank Math/Code, ESL 14.10, AI: AMA 22.3
Non-convex PL Inequality
Assignment 6 due

Mike's Demos

In semesters where Mike Gelbart teaches the course, he uses a variety of Python notebooks. Julia versions of these notebooks are available here.

Related courses that have online notes

Mark Schmidt > Courses > CPSC 540