CPSC 340 - Machine Learning and Data Mining (Fall 2017)

Lectures (beginning September 6): Mondays, Wednesdays, and Fridays 4-5 (Forest Sciences Centre 1005).

Instructor: Mark Schmidt.
Instructor office hours: Tuesdays at 3-4pm (ICICS 146).

Tutorials (beginning September 11):

Mondays from 5-6 (DMP 101).
Tuesdays from 3:30-4:30 and 4:30-5:30 (DMP 201).
Wednesdays from 9-10 and 10-11 (DMP 201).

Teaching Assistants: Clement Fung, Hashemi Hooman, Siyuan He, Tanner Johnson, Angad Kalra, Aaron Mishkin, Xin Bei She, Sharan Vaswani, Nasim Zolaktaf, Zainab Zolaktaf
TA office hours (all in Demco Learning Centre):

Mondays 1-2 (Siyuan at Table 3).
Tuesdays 2-3 (Aaron at Table 1).
Wednesdays 2-3 (Hooman at Table 2).
Thursdays 2-3 (Clement at Table 4, with Aaron on weeks when assignments are due).
Fridays 10-11 (Angad at Table 2).

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.

Registration: Undergraduate and graduate students from any department are welcome to take the class. However, due to the high demand only UBC computer science majors can directly register for the course. For all other students, to enroll in the course you need to sign up for the wait list (before September 14). Note that last year all students on the wait list were ultimately accepted into the course (but we did not have room for auditors.)

Prerequisites:

Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
Linear algebra (one of MATH 152, 221, or 223).
Probability (one of STAT 200, STAT 203, STAT 241, STAT 251, STAT 302, MATH 302, MATH 318, or BIOL 300).
Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).

Graduates students may receive a warning about prerequisites when registering and may need to follow additional steps described here.

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses: Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.

Grading: Assignments 30%, Midterm 20%, Final 50%.

Piazza for course-related questions.

List of topics

We will roughly cover the following topics:

Data representation and summarization.
Supervised learning with frequencies and distances.
Data clustering, outlier detection, and association rules.
Linear prediction, regularization, and kernels.
Latent-factor models and collaborative filtering.
Neural networks and deep learning.

Timetable

Date	Slides	Related Readings and Links	Homework and Notes
Wed Sep 6	Motivation and Syllabus	What is Machine Learning? Machine Learning Rise of the Machines Talking Machine Episode 1	Assignment 0 a0.zip a0.tex
Fri Sep 8	Exploratory Data Analysis	Gotta Catch'em all Why Not to Trust Statistics Visualization Types Google Chart Gallery Other Tools
Mon Sep 11	Decision Trees	A Visual Introduction to Machine Learning, Decision Trees AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2	Big-O Notes Julia Commands
Wed Sep 13	Fundamentals of Learning	7 Steps of Machine Learning IID Cross-validation Bias-variance No Free Lunch AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5	Course Notation Guide Tutorial 1
Fri Sep 15	Probabilistic Classifiers	Conditional probability (demo) Naive Bayes ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2	Assignment 0 due Probability Notes Probability Slides
Mon Sep 18	Non-Parametric Models	K-nearest neighbours Decision Theory for Darts Norms AI: AMA 18.8, ESL 13.3, ML:APP 1.4	Assignment 1 a1.zip a1.tex
Wed Sep 20	Ensemble Methods	Ensemble Methods Random Forests Empirical Study Kinect AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6	Tutorial 2
Fri Sep 22	Clustering	Clustering K-means clustering (demo) K-Means++ (demo) IDM 8.1-8.2, ESL: 14.3
Mon Sep 25	Density-based Clustering	DBSCAN (video, demo) IDM 8.4	Tutorial 3
Wed Sep 27	Hierarchical Clustering	Hierarchical Clustering Phylogenetic Trees IDM 8.3, ESL 14.3.12, ML:APP 25.5
Fri Sep 29	Finding Similar Items	MMD Chapter 3	Assignment 1 due
Mon Oct 2	Least Squares	Linear Regression (demo, 2D data, 2D video) Least Squares ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6, Essence of Calculus	Assignment 2 a2.zip a2.tex
Wed Oct 4	Normal Equations	Why should one learn machine learning from scratch? Essence of Linear Algebra Convex Functions	Tutorial 4 Linear Algebra Notes Linear/Quadratic Gradients
Fri Oct 6	Numerical Optimization
Wed Oct 11	Gradient Descent	Gradient Descent ML:APP 7.4	Tutorial 5
Fri Oct 13	Nonlinear Regression	Fluid Simulation ESL 5.1, 6.3	Assignment 2 due
Mon Oct 16	Feature Selection	Genome-Wide Association Studies AIC, BIC ESL 3.3 , 7.5-7
Wed Oct 18	Regularization	ESL 3.4., ML:APP 7.5, AI:AMA 18.4
Fri Oct 20	Midterm
Mon Oct 23	More Regularization	RBF video RBF and Regularization video ESL 6.7, ML:APP 13.3-4	Assignment 3 a3.zip a3.tex
Wed Oct 25	Linear Classifiers	Perceptron ESL 4.5, ML:APP 8.5	Tutorial 6
Fri Oct 27	More Linear Classifiers	Support Vector Machines ESL 4.4, 12.1-2, ML:APP 8.1-3, 14.5, AI:AMA 18.9
Mon Oct 30	Kernel Trick	ESL 12.3, ML:APP 14.1-4	Assignment 4 a4.zip a4.tex
Wed Nov 1	Stochastic Gradient	Stochastic Gradient ML:APP 8.5
Fri Nov 3	Multi-Class Classification	ESL 4.4, ML:APP 8.3.7, 9.5	Assignment 3 due
Mon Nov 6	MLE and MAP	Maximum Likelihood Estimation ML:APP 9.3-4	Max and Argmax Notes
Wed Nov 8	Principal Component Analysis	Principal Component Analysis ESL 14.5, IDM B.1, ML:APP 12.2	Tutorial 8
Fri Nov 10	More PCA	Making Sense of PCA SVD Eigenfaces
Wed Nov 15	Sparse Matrix Factorization	Non-Negative Matrix Factorization ESL 14.6, ML: APP 13.8	Assignment 5 a5.zip a5.tex
Fri Nov 17	Recommender Systems	Recommender Systems Netflix Prize	Assignment 4 due
Mon Nov 20	Multi-Dimensional Scaling	Nonlinear Dimensionality Reduction ESL 14.8-9, IDM B.2
Wed Nov 22	Deep Learning	Google Video What is a Neural Network? Interactive Guide ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7	Tutorial 9
Fri Nov 24	More Deep Learning	Fortune Article Deep Learning References ML:APP 28.3, ESL 11.5
Mon Nov 27	Convolutional Neural Networks	Convolutional Neural Networks AlexNet ML:APP 28.4, ESL 11.7	Assignment 5 due
Wed Nov 29	More CNNs
Fri Dec 1	Guest Lecture: Siamak Ravanbakhsh

Related courses that have online notes

Machine Learning and Data Mining (UBC 2012)
Introduction to Machine Learning (Alberta - Schuurmans)
Practical Machine Learning (Berkeley)
Machine Learning (MIT)
Machine Learning (CMU)
Course in Machine Learning (Maryland)
Principals of Knowledge Discovery in Data (Alberta)
Mining Massive Data Sets (Stanford)
Data Mining (CMU)

Mark Schmidt > Courses > CPSC 340