CPSC 340 - Machine Learning and Data Mining (Fall 2015)

Lectures: Mondays, Wednesdays, and Fridays (3-4 in West Mall Swing Space 121)

Tutorials: Mondays from 11-12 (DMP 201), 2-3 (DMP 201), 4-5 (DMP 201), and 5-6 (DMP 101).

Office hours: Tuesdays from 10-11 (ICICS X836, except November 17 when it will be ICICS 146) and 4-5 (ICICS 146), Thursdays from 3-4 (ICICS X836).

Instructor: Mark Schmidt

Teaching Assistants: Ricky (Tian Qi) Chen, Issam Laradji, Bobak Shahriari, Sharan Vaswani, Yan Zhao.

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.


Registration: Undergraduate and graduate students from any department are welcome to take the class, provided that they satisfy the prerequisites. If you do not satisfy the exact prerequisites but would still like to enroll in the class, please fill out the the form available here. If you are interested in these topics but the course is full, please sign up for the waiting list; a certain number of students are likely to shift their schedule which will open up spots, while a long waiting list makes it more likely that we can have multiple sections and multiple courses on these topics. You may also want to consider taking related courses from statistics: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461.

Grading: Assignments 25%, Midterm 30%, Final 45%.

Piazza for course-related questions.

List of topics

We will roughly cover the following topics:


Date Topic Related Readings and Links Homework and Notes
Wed Sep 9
Syllabus Wikipedia Machine Learning Rise of the Machines Talking Machine Episode 1 Assignment 1, a1.zip Notes on Probability
Fri Sep 11
Data Exploration Visualization Types Google Chart Gallery Matlab demos Other Tools
Mon Sep 14
Decision Trees A Visual Introduction to Machine Learning, AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2 Notes on Big-O
Wed Sep 16
Learning Theory AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Fri Sep 18
Generative Models ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2 Assignment 1 due
Mon Sep 21
Non-Parametric Models AI: AMA 18.8, ESL 13.3, ML:APP 1.4 Assignment 2, a2.zip
Wed Sep 23
Ensemble Methods AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
Fri Sep 25
Clustering IDM 8.1-8.2, ESL: 14.3, K-Means++ Demo
Mon Sep 28
Density-based Clustering IDM 8.4
Wed Sep 30
Hierarchical Clustering IDM 8.3, ESL 14.3.12, ML:APP 25.5
Fri Oct 2
Association Rules IDM 6.1-6.3, ESL 14.2 Assignment 2 due
Mon Oct 5
Linear Regression Partial Derivatives and Gradients (Part 2, Part 3, Part 4), ELS 3.1-2, ML:APP 7.1-3, AI:AMA 18.6 Assignment 3, a3.zip, Notes on Linear Algebra
Wed Oct 7
Non-Linear Regression ESL 5.1, 6.3, and 6.7
Fri Oct 9
Regularization ESL 3.4, ML:APP 7.5, AI:AMA 18.4
Wed Oct 14
Robust Regression ML:APP 7.4
Fri Oct 16
Feature Selection ESL 3.3, ML:APP 13.3
Mon Oct 19
Logistic Regression ESL 4.4, ML:APP 8.1-3, AI:AMA 18.9
Wed Oct 21
Kernel Methods ESL 4.5 and 12.1-3, ML:APP 14.1-5
Fri Oct 23
Stochastic Gradient ML:APP 8.5 Assignment 3 due
Mon Oct 26
Principal Component Analysis ESL 14.5, IDM B.1, ML:APP 12.2
Wed Oct 28
Outlier Detection IDM 10.1-5
Fri Oct 30
Mon Nov 2
Sparse Matrix Factorization ESL 14.6, ML: APP 13.8 Assignment 4, a4.zip
Wed Nov 4
Recommender Systems Wikipedia
Fri Nov 6
Multi-Dimensional Scaling ESL 14.8-9, IDM B.2
Mon Nov 9
Neural Networks Google, ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7
Fri Nov 13
Deep Learning ML:APP 28.3, ESL 11.5 Assignment 4 due
Mon Nov 16
Convolutional Neural Networks ML:APP 28.4, ESL 11.7 Assignment 5, a5.zip Tutorial 5a
Wed Nov 18
Discrete Labels ML:APP 8.3.7 and 9.3-5, ESL 4.4
Fri Nov 20
Semi-Supervised Learning Wikipedia
Mon Nov 23
Ranking PageRank Slides, PageRank math/code, ESL 14.10, ML:APP 9.7, AI: AMA 22.3 Tutorial 5b
Wed Nov 25
Spectral Clustering ESL 14.5.3, ML:APP 25.4 Assignment 6, a6.zip
Fri Nov 27
Sequence Mining IDM 7.4 Assignment 5 due
Mon Nov 30
Markov Chains AI:AMA 15.1-3, ML: APP 17.1-4 Tutorial 6
Wed Dec 2
Belief Networks AI:AMA 14.1-4, ML:APP 10.1-5
Fri Dec 4
Course Review/Preview Assignment 6 due

Related courses that have online notes

Mark Schmidt > Courses > CPSC 340