CPSC 340 - Machine Learning and Data Mining (Fall 2017)

Lectures (beginning September 6): Mondays, Wednesdays, and Fridays 4-5 (Forest Sciences Centre 1005).

Instructor: Mark Schmidt.
Instructor office hours: Tuesdays at 3-4pm (ICICS 146).

Tutorials (beginning September 11):

Teaching Assistants: Clement Fung, Hashemi Hooman, Siyuan He, Tanner Johnson, Angad Kalra, Aaron Mishkin, Xin Bei She, Sharan Vaswani, Nasim Zolaktaf, Zainab Zolaktaf
TA office hours (all in Demco Learning Centre):

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.

Registration: Undergraduate and graduate students from any department are welcome to take the class. However, due to the high demand only UBC computer science majors can directly register for the course. For all other students, to enroll in the course you need to sign up for the wait list (before September 14). Note that last year all students on the wait list were ultimately accepted into the course (but we did not have room for auditors.)

Prerequisites:

Graduates students may receive a warning about prerequisites when registering and may need to follow additional steps described here.

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses: Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.

Grading: Assignments 30%, Midterm 20%, Final 50%.

Piazza for course-related questions.

List of topics

We will roughly cover the following topics:

Timetable

Date Slides Related Readings and Links Homework and Notes
Wed Sep 6 Motivation and Syllabus What is Machine Learning? Machine Learning
Rise of the Machines Talking Machine Episode 1
Assignment 0 a0.zip a0.tex
Fri Sep 8 Exploratory Data Analysis Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Other Tools
Mon Sep 11 Decision Trees A Visual Introduction to Machine Learning, Decision Trees
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Julia Commands
Wed Sep 13 Fundamentals of Learning 7 Steps of Machine Learning IID Cross-validation Bias-variance No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Tutorial 1
Fri Sep 15 Probabilistic Classifiers Conditional probability (demo) Naive Bayes
ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2
Assignment 0 due
Probability Notes Probability Slides
Mon Sep 18 Non-Parametric Models K-nearest neighbours Decision Theory for Darts Norms
AI: AMA 18.8, ESL 13.3, ML:APP 1.4
Assignment 1 a1.zip a1.tex
Wed Sep 20 Ensemble Methods Ensemble Methods Random Forests Empirical Study Kinect
AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
Tutorial 2
Fri Sep 22 Clustering Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Mon Sep 25
Density-based Clustering DBSCAN (video, demo)
IDM 8.4
Tutorial 3
Wed Sep 27
Hierarchical Clustering Hierarchical Clustering Phylogenetic Trees
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Fri Sep 29
Finding Similar Items MMD Chapter 3 Assignment 1 due
Mon Oct 2
Least Squares Linear Regression (demo, 2D data, 2D video) Least Squares
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6, Essence of Calculus
Assignment 2 a2.zip a2.tex
Wed Oct 4
Normal Equations Why should one learn machine learning from scratch?
Essence of Linear Algebra Convex Functions
Tutorial 4 Linear Algebra Notes
Linear/Quadratic Gradients
Fri Oct 6
Numerical Optimization
Wed Oct 11
Gradient Descent Gradient Descent
ESL 5.1, ML:APP 7.4
Tutorial 5
Fri Oct 13
Nonlinear Regression Fluid Simulation
ESL 6.3, 7.5, 7.7
Assignment 2 due
Mon Oct 16
Feature Selection Genome-Wide Association Studies AIC, BIC
ESL 3.3 , 7.6
Wed Oct 18
Regularization ESL 3.4., 6.7, ML:APP 7.5, AI:AMA 18.4
Fri Oct 20
Midterm

Related courses that have online notes



Mark Schmidt > Courses > CPSC 340