CPSC 532S: Modern Statistical Learning Theory – 2021-22 W2

Instructor: Danica Sutherland (she/her): dsuth@cs.ubc.ca, ICICS X563.
Lecture info: Mondays/Wednesdays, 13:30 - 15:00, DMP 101 (also available on Zoom).
Office hours: Mondays 12-1pm, Thurdsays 4-5pm (ICICS X563 or come to the class Zoom).
Or request another time; schedule with a private Piazza post (check my potential availability calendar if you have a CS login).
Also feel free to just ask your question on Piazza, where the answer could help your classmates too, depending on the kind of question.

Canvas and Piazza (easiest registration if you follow the link from Canvas, but you can sign up directly here)


SSBD below refers to the book of Shalev-Shwartz and Ben-David; MRT to that of Mohri, Rostamizadeh, and Talwakar. Italicized entries are tentative.

1MonJan 10Intro / overviewSSBD 1-2; MRT 2
MonJan 10Assignment 1 posted (and .tex)
2WedJan 12PACSSBD 2-3; MRT 2
3MonJan 17Probability / uniform convergenceMeasure Theory Tutorial
4WedJan 19Finish uniform convergence + No free lunch + start of VCSSBD 5-6
ThuJan 20Assignment 1 due, 11:59pm – solutions
FriJan 21Drop deadline
5MonJan 24More on VC dimensionSSBD 6; MRT 3
6WedJan 26More VC + RademacherSSBD 9.1; MRT 3
7MonJan 31More Rademacher
Some fiddly issues with abs value; update on Wednesday
MRT 3; SSBD 26
8WedFeb 2Even more RademacherMRT 3, 11; SSBD 26
FriFeb 4Assignment 2 posted (and .tex)
9MonFeb 7Structural Risk MinimizationSSBD 7; MRT 4
10WedFeb 9Modes of learnability + Model selection
plus the long-awaited proof of Massart's lemma
SSBD 7, 11; MRT 4
MonFeb 14Shift to hybrid mode (delayed by being sick)
11MonFeb 14Convex learning problems + Gradient descentSSBD 12, 14
Bubeck, Boyd/Vandenberghe
12WedFeb 16SGDSSBD 14
FriFeb 18Assignment 2 due, 11:59pm – solutions
MonFeb 21Midterm break
WedFeb 23Midterm break
13MonFeb 28Regularization + StabilitySSBD 13; MRT 14
14WedMar 2SVMs + Margin boundsSSBD 15, 26.3; MRT 5
15MonMar 7SVM duality, kernel definitionsSSBD 15/16; MRT 5/6;
more kernel stuff linked in slides
16WedMar 9More kernels: representer theorem, kernel ridge
MonMar 14Assignment 3 posted (and .tex)
17MonMar 14Some more kernels (universality, Gaussian processes)
+ Deep learning (approximation, generalization)
Telgarsky section 2
18WedMar 16More deep learning approximation + generalizationTelgarsky section 14
WedMar 16Project proposals due
MonMar 21Class canceled
19WedMar 23Neural tangent kernelsTelgarsky sections 4, 8
20MonMar 28“Does any of this stuff work at all?” Limits of NTK +
Interpolation and the limits of uniform convergence
MonMar 28Assignment 3 due (extended), 11:59pm – solutions
TueMar 29Assignment 4 posted (and .tex)
21WedMar 30Double descent and implicit regularization
+ PAC-Bayes
BHMM / NKBYBS / Telgarsky 10
SSBD 31 / Guedj
22MonApr 4Online learning
WedApr 6Project presentations
FriApr 8Project writeups due
FriApr 8Assignment 4 due, 11:59pm – solutions
TBDTake-home final (will have a significant window to do it during finals period)


The course meets in person in DMP 101 (since Feb 14th) and is also available on Zoom: the meeting link and recordings are available on Canvas and Piazza.

Grading scheme: 70% assignments (including a small project), 30% final.

The lowest assignment grade (not including the project) will be dropped. The project counts as one assignment. Assignments should be done in LaTeX – not handwritten or in a word processor. Hand-in on Gradescope, as described on Piazza.

There will be one “big assignment” which serves as a (small) project: something on the scale of doing some experiments to explore a paper, doing a lit review in a particular area, extending / unifying a few papers, etc. A proposal will be due beforehand; details to come.

The final exam may be take-home, synchronous online, or in-person; TBD.

There may also be some paper presentations later in the course, in which case the paper presenters will be able to use that to replace part of an assignment grade. This is dependent on the COVID situation and other factors; TBD.


The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities. Probably: PAC-Bayes, analysis of kernel methods, margin bounds, stability. Maybe: limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

There will be some overlap with CPSC 531H: Machine Learning Theory (Nick Harvey's course, last taught in 2018), but if you've taken that course, you'll still get something out of this one. We'll cover less on optimization / online learning / bandits than that course did, and try to cover some more recent ideas used in contemporary deep learning theory.

(This course is unrelated to CPSC 532S: Multimodal Learning with Vision, Language, and Sound, from Leon Sigal.)


There are no formal prerequisites. I will roughly assume:

If you have any specific questions about your background, feel free to ask.


Learning theory textbooks and surveys:

If you need to refresh your linear algebra or other areas of math:

Resources on learning measure-theoretic probability (not required to know this stuff in detail, but you might find it helpful):

Similar courses: