CPSC 340 and 532M - Machine Learning and Data Mining (Fall 2022)
Lectures Sections (beginning September 6):
- 12-1pm (Monday/Wednesday/Friday in UBC Life Building 2201)
- 4-5pm (Monday/Wednesday/Friday in UBC Life Buliding 2302)
Instructor: Mark Schmidt
Instructor office hours TBA
Tutorials (beginning September 12):
- Mondays 1-2pm (DMP 101)
- Mondays 5-6pm (DMP 201)
- Tuesdays 1-2pm (MacLeod 3008)
- Tuedays 4-5pm (MacLeod 2012)
- Wednesdays 12-1pm (MacLeod 3014)
- Thursdays 9-10am (DMP 101)
- Thursdays 10-11am (DMP 101)
- Thursdays 12-1pm (MacLeod 3014)
- Thursdays 1-2pm (MacLeod 3014)
Teaching assistants: TBA
TA office hours (all in Demco Learning Centre): TBA
Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We will focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.
Registration: Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 532M (which has an extra small project component). Below are more details on registration for each course:
- The majority of the seats in 340 are reserved for undergraduate computer science majors. For other students, to enroll in the course you need to sign up for the wait list. Note that last year all students on the wait list are typically accepted into the course.
- 532M currently has 60 spots available, but this likely grow. Note that you may likely get a warning about prerequisities when you register (which you can ignore), but depending on your department you may later be contacted and asked to show how you satisfy the relevant prerequisites. As with 340, if 532M becomes full then signing up for the waiting list is the only way to enroll in the course. In previous years all students on the wait list were ultimately accepted into the course.
Starting in the second week of classes, we will have weekly tutorials run by the TAs. These will do things like go through provided assignment code, review background material, review big concepts, and/or do exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section.
Prerequisites:
- Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
- Linear algebra (one of MATH 152, 221, or 223).
- Probability (one of STAT 241, STAT 251, ECON 325, ECON 327, MATH 302, STAT 302, or MATH 318).
- Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).
Students who do not meet these requirements should consider taking CPSC 330 ("Applied Machine Learning").
Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.
Related Courses:
The most related course is CPSC 330: Applied Machine Learning. This course has fewer prerequisities and covers some of the same material, but focuses more on applications rather than understanding ML ideas in depth.
Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here (this was written in 2016 so may be out of date).
Grading:
- 340: Assignments 30%, Midterm 20%, Final 50%.
- 532M: Assignments 25%, Midterm 15%, Final 40%, Project 20%.
List of topics
We will roughly cover the following topics:
- Data representation and summarization.
- Supervised learning with frequencies and distances.
- Data clustering, outlier detection, and association rules.
- Linear prediction, regularization, and kernels.
- Latent-factor models and collaborative filtering.
- Neural networks and deep learning.
Lectures, Assignments, Related Readings, and Links
Mike's Demos
In semesters where Mike Gelbart teaches the course, he uses a variety of Python notebooks. Julia versions of these notebooks are available here.
Related courses that have online notes
Mark Schmidt > Courses > CPSC 340