Lectures    ·    Assignments    ·     Course Schedule    ·   Piazza    ·   Canvas


CPSC 340 Machine Learning and Data Mining

Summer 2020

We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the "big data" buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technologies, with motivating applications from a variety of disciplines.

Announcements

This Semester

Instructor

Alireza Shafaei

Lectures Sections (beginning May 11, 2020)

Office hours

Tutorials (beginning May 13, 2020)

Teaching assistants

Ramya Rao Basava
Farnoosh Javadi
Lironne Kurzman
Ke (Mark) Ma
Egor Peshkov
Shahriar Shayesteh
Ming Zhang

Services

Note Zoom is hosted on servers in the U.S., so your name and data about how you use the system will be stored on servers outside Canada. If you have privacy concerns about this data collection: a) don’t create an account with Zoom and provide only your first name or a nickname when you join a session, b) keep your camera off and microphone muted during sessions, and c) don’t share any identifying information about yourself.

Course Calendar

Registration

Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 532M (which has an extra small project component -- not offered in summers). Below are more details on registration for each course:

Starting in the first week of classes, we will have weekly tutorials run by the TAs. These will do things like go through provided assignment code, review background material, review big concepts, and/or do exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section.

Prerequisites

Graduates students may receive a warning about prerequisites when registering and may need to follow additional steps described here.

Textbook

There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses

Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.

Grading

List of topics

We will roughly cover the following topics:

Lectures

Date Video Slides Related Readings and Links Notes
Mon May 11 may.11.2020.mp4 Motivation and Syllabus What is Machine Learning?  ·  Machine Learning Wiki  ·  Rise of the Machines  ·  Talking Machine Episode 1
Exploratory Data Analysis Gotta Catch'em all  ·  Why Not to Trust Statistics  ·  Visualization Types  ·  Google Chart Gallery  ·  Other Tools See assignment 1 below.
Wed May 13 may.13.2020.mp4 Decision Trees A Visual Introduction to Machine Learning  ·  Decision Trees  ·  Entropy
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Fundamentals of Learning 7 Steps of Machine Learning ·  IID  ·  Cross-validation  ·  Bias-variance  ·  No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Course Notation Guide
Fri May 15 may.15.2020.mp4 Probabilistic Classifiers Conditional probability (demo)  ·  Naive Bayes  ·  Probabilities and Battleship
ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2
Probability Notes Probability Slides
Non-Parametric Models K-nearest neighbours  ·  Decision Theory for Darts  ·  Norms
AI: AMA 18.8, ESL 13.3, ML:APP 1.4
Wed May 20 may.20.2020.mp4 Ensemble Methods Ensemble Methods  ·  Random Forests  ·  Empirical Study  ·  Kinect  ·  Data Augmentation
AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
Clustering Clustering  ·  K-means clustering (demo)  ·  K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Fri May 22
may.22.2020.mp4 More Clustering DBSCAN (video demo)  ·  Hierarchical Clustering  ·  Phylogenetic Trees
IDM 8.4
Outlier Detection Empirical Study
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Finding Similar Items
(Bonus Lecture)
MMD Chapter 3
Mon May 25 may.25.2020.mp4 Least Squares Linear Regression  ·  (demo, 2D data, 2D video)  ·  Least Squares Essence of Calculus  ·  Partial Derivative  ·  Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6
Calculus Notes
Nonlinear Regression Why should one learn machine learning from scratch? ·  Essence of Linear Algebra  ·  Matrix Differentiation ·  Fluid Simulation (video) · 
ESL 5.1(Poly basis), 6.3 (local regression)
Linear Algebra Notes
Linear/Quadratic Gradients
Wed May 27
may.27.2020.mp4 Gradient Descent Gradient Descent  ·  Convex Functions
Robust Regression ML:APP 7.4
The midterm will be from the material above.
Fri May 28
may.29.2020.mp4 Feature Selection Genome-Wide Association Studies ·  AIC ·  BIC
ESL 3.3, 7.5-7
Regularization ESL 3.4., ML:APP 7.5, AI:AMA 18.4
Mon Jun 1 Midterm Exam in class. You will need a webcam to take the exam.
Mon Jun 1
jun.1.2020.mp4 More Regularization RBF video  ·  RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
HW 4 released
Wed Jun 3 jun.3.2020.mp4 Linear Classifiers Perceptron
ESL 4.5, ML:APP 8.5
More Linear Classifiers Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 18.9
Fri Jun 5 jun.5.2020.mp4 Feature Engineering Gmail Priority Inbox
Convolutions + 1/2 Kernels
Mon Jun 8 jun.8.2020.mp4 Kernel Trick (Bonus: all slides) ESL 12.3, ML:APP 14.1-4 HW5 released
1/2 Kernels + Stochastic Gradient Stochastic Gradient
ML:APP 8.5
Boosting AdaBoost (video)  ·  XGBoost (video)
ML:APP 16.4
Max and Argmax Notes
Wed Jun 10
jun.10.2020.mp4 MLE and MAP Maximum Likelihood Estimation
ML:APP 9.3-4
Principal Component Analysis Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
Fri Jun 12
jun.12.2020.mp4 More PCA Making Sense of PCA ·  SVD Eigenfaces
Sparse Matrix Factorization Non-Negative Matrix Factorization  ·  (original - access from UBC)
ESL 14.6, ML: APP 13.8
Mon Jun 15
jun.15.2020.mp4 Recommender Systems Recommender Systems  ·  Netflix Prize HW 6 released
Multi-Dimensional Scaling Nonlinear Dimensionality Reduction  ·  t-SNE demo
ESL 14.8-9, IDM B.2
The FINAL will be from the material above.
Wed Jun 17
jun.17.2020.mp4 Deep Learning Google Video  · What is a Neural Network?  · Interactive Guide
ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7
More Deep Learning Fortune Article  · Deep Learning References ·  Alchemy ·  Convolutional Neural Networks
ML:APP 28.4, ESL 11.7
ML:APP 28.3, ESL 11.5
From
Mon Jun 22
8:00 AM
To
Wed Jun 24
11:00 AM
Final Exam
  • Take-home
  • Individually
  • Open-book
See this Piazza post for more information.

The acroynms in the table above refer to the following textbooks:

Related courses that have online notes


Homework Assignments

Post Date Due Date Files Notes/Links
Mon May 11 Fri May 15
Sun May 17
a1.pdf  ·  a1.zip (contains the LaTeX template + code)
a1.v2.pdf
Setting up Python
v2 changes are highlighted in red.
Mon May 18 Sun May 24 a2.pdf  ·  a2.zip (contains the LaTeX template + code)
Mon May 25 Sun May 31 a3.pdf  ·  a3.zip (contains the LaTeX template + code)
Mon Jun 1st Sun Jun 7th a4.pdf  ·  a4.zip (contains the LaTeX template + code)
Mon Jun 8th Sun Jun 14th a5.pdf  ·  a5.zip (contains the LaTeX template + code)
Mon Jun 14th Sun Jun 21st a6.pdf  ·  a6.zip (contains the LaTeX template + code) Due to time constraints, no late-submission is allowed.

Previous Offerings