CPSC 340 and 532M - Machine Learning and Data Mining (Fall 2022)

Lectures Sections (beginning September 6): Instructors: Andreas Lehrmann and Mark Schmidt
Instructor office hours: Fridays at 1pm (ICICS X139)

Tutorials (beginning September 12):

Teaching assistants: Daniel Alisafe, Curtis Fox, Ruiyu Gou, Lironne Kurzman, Frederik Kunstner, Alan Miligan, Justin Rahardjo, Betty Shea, Xin Ping Shi, Chenwei Zhang, Tianyue Zhang
TA office hours (all in Demco Learning Centre): See Piazza

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We will focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.

Registration: Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 532M (which has an extra small project component). Below are more details on registration for each course:

Starting in the second week of classes, we will have weekly tutorials run by the TAs. These will do things like go through provided assignment code, review background material, review big concepts, and/or do exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section.

Prerequisites:

Students who do not meet these requirements should consider taking CPSC 330 ("Applied Machine Learning").

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses: The most related course is CPSC 330: Applied Machine Learning. This course has fewer prerequisities and covers some of the same material, but focuses more on applications rather than understanding ML ideas in depth.

Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here (this was written in 2016 so may be out of date).

Grading:

List of topics

We will roughly cover the following topics:

Lectures, Assignments, Related Readings, and Links

Date Slides Related Readings and Links Homework and Notes
Wed Sep 7 Motivation and Syllabus What is Machine Learning? Machine Learning
Rise of the Machines Talking Machine Episode 1
Mathematics for Machine Learning
Assignment 1 a1.zip a1.tex
Fri Sep 9 Exploratory Data Analysis Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Other Tools
Mon Sep 12 Decision Trees A Visual Introduction to Machine Learning, Decision Trees Entropy What is Big O Notation?
AI:AMA 19.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Julia Commands
Wed Sep 14 Fundamentals of Learning 7 Steps of Machine Learning IID Cross-validation Bias-variance No Free Lunch
AI:AMA 19.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Course Notation Guide
Fri Sep 16 Probabilistic Classifiers Conditional probability (demo) Naive Bayes Probabilities and Battleship
AI:AMA 12.6, ESL 4.3, ML:APP 2.2, 3.5, 4.1-4.2
Assignment 1 due
Probability Notes Probability Slides
Wed Sep 21 Non-Parametric Models K-nearest neighbours Decision Theory for Darts Norms
AI:AMA 19.7, ESL 13.3, ML:APP 1.4
Assignment 2 a2.zip a2tex.zip
Fri Sep 23 Ensemble Methods Ensemble Methods Random Forests Empirical Study Kinect
AI:AMA 19.8, ESL: 7.11, 8.2, 15, 16.3, ML:APP 6.2.1, 16.2.5, 16.6
Mon Sep 26 Clustering Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Wed Sep 28
More Clustering DBSCAN (video, demo) Hierarchical Clustering Phylogenetic Trees
IDM 8.4
Mon Oct 3
Outlier Detection Empirical Study
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Assignment 2 due
Wed Oct 5
Linear Regression Linear Regression (demo, 2D data, 2D video) Least Squares Essence of Calculus Partial Derivative Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 19.6
Assignment 3 a3.zip a3tex.zip
Fri Oct 7
Nonlinear Regression Why should one learn machine learning from scratch? Essence of Linear Algebra Matrix Differentiation Fluid Simulation (video)
ESL 5.1, 6.3
Linear Algebra Notes
Linear/Quadratic Gradients
Wed Oct 12
Gradient Descent Gradient Descent Convex Functions
Fri Oct 14
Robust Regression ML:APP 7.4
Mon Oct 17
Feature Selection Genome-Wide Association Studies AIC, BIC
ESL 3.3 , 7.5-7
Assignment 3 due
Wed Oct 19
Regularization ESL 3.4., ML:APP 7.5, AI:AMA 19.4
Fri Oct 21
More Regularization RBF video RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
Mon Oct 24
Linear Classifiers
6:30pm: Midterm (IRC 2)
Perceptron
ESL 4.5, ML:APP 8.5
Wed Oct 26
More Linear Classifiers Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 19.6
Fri Oct 28
Feature Engineering Gmail Priority Inbox Assignment 4 a4.zip a4tex.zip
Mon Oct 31
Convolutions But what is a convolution?
Wed Nov 2
Kernel Trick ESL 12.3, ML:APP 14.1-4
Fri Nov 4
Stochastic Gradient Descent Stochastic Gradient Descent, Theory and Practice
ML:APP 8.5
Mon Nov 7
Boosting AdaBoost (video) XGBoost (video)
ML:APP 16.4
Max and Argmax Notes
Mon Nov 14
MLE and MAP Maximum Likelihood Estimation
ML:APP 9.3-4
Assignment 4 due
Wed Nov 16
Principal Component Analysis Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
Assignment 5 a5.zip a5tex.zip
Fri Nov 18
More PCA Making Sense of PCA SVD Eigenfaces
Mon Nov 21
Beyond PCA Non-Negative Matrix Factorization (original - access from UBC) Recommender Systems Netflix Prize
ESL 14.6, ML: APP 13.8
Wed Nov 23
Multi-Dimensional Scaling Nonlinear Dimensionality Reduction t-SNE demo
ESL 14.8-9, IDM B.2
Assignment 6 a6.zip a6.tex
Fri Nov 25
Neural Networks Google Video What is a Neural Network? Interactive Guide
ML:APP 16.5, ESL 11.1-4, AI:AMA 21.1
Assignment 5 due
Mon Nov 28
Over-Parameterization
Wed Nov 30
Deep Neural Networks Fortune Article Deep Learning References Alchemy
ML:APP 28.3, ESL 11.5, AI:AMA 21.2 and 21.4-5
Fri Dec 2
Convolutional Neural Networks Convolutional Neural Networks
ML:APP 28.4, ESL 11.7, AI:AMA 21.3
Mon Dec 5
Autoencoders and Multi-Label
AM: Fully-Convolutional Networks
PM: Recurrent Neural Networks
AI:AMA 21.6-8
Wed Dec 7
AM: What do we Learn?
PM: LSTMs and Transformers
Assignment 6 due

Mike's Demos

In semesters where Mike Gelbart taught the course, he used a variety of Python notebooks. Julia versions of these notebooks are available here.

Related courses that have online notes



Mark Schmidt > Courses > CPSC 340