CPSC 532D: Modern Statistical Learning Theory

Previously offered in in 2022W1 and (with the name 532S) in 2021W2; this instance will be broadly similar.

Schedule

Italicized entries are tentative. The book acronyms are described here. As a general rule:

SSBD is the easiest to understand but sometimes over-simplified (sometimes in ways that make it incorrect);
MRT can be written in a more opaque way, but is at about the level of detail we mostly use;
Bach is maybe similar to MRT, in my opinion easier to follow but has a somewhat different focus;
Zhang is usually at a more advanced, and often more abstract, level than we mostly cover.

Date		Topic	Supplements
Tu	Sep 5	No class: Imagine Day
Th	Sep 7	Course intro, ERM	SSBD 1-2; MRT 2
F	Sep 8	Assignment 1 posted: pdf, tex
Tu	Sep 12	Class canceled: sick
Th	Sep 14	Uniform convergence with finite classes [Online: sick]	SSBD 2-4; MRT 2
M	Sep 18	Assignment 1 due at noon
M	Sep 18	Drop deadline
Tu	Sep 19	Concentration inequalities	SSBD B; MRT D Zhang 2; Wainwright 2
Th	Sep 21	PAC learning; covering numbers	SSBD 3, MRT 2 Bach 4.4.4, Zhang 3.4/4/5
Sa	Sep 23	Assignment 2 posted: pdf, tex
Tu	Sep 26	Rademacher complexity	MRT 3.1; SSBD 26; Bach 4.5; Zhang 6
Th	Sep 28	Rademacher complexity	MRT 3.1; SSBD 26; Bach 4.5; Zhang 6
Tu	Oct 3	VC dimension	SSBD 6; MRT 3.2-3.3
Th	Oct 5	finish VC; No Free Lunch; “Fundamental Theorem”	SSBD 5; MRT 3.4 Bach 4.6 / 12; Zhang 12
Tu	Oct 10	Structural Risk Minimization / Min Description Length	SSBD 7; MRT 4
W	Oct 11	Assignment 2 due at midnight
Th	Oct 12	No class: UBC follows a Monday schedule
Tu	Oct 17	finish SRM, MDL; briefly start margins
Th	Oct 19	Margins, SVMs	MRT 5; SSBD 15, 26
M	Oct 23	Assignment 3 posted: pdf, tex
Tu	Oct 24	More margins/SVMs	MRT 5; SSBD 15, 26
Th	Oct 26	Kernels	Bach 7, MRT 6, SSBD 16
F	Oct 27	Withdrawal deadline
Tu	Oct 31	More kernels
Th	Nov 2	Talk by Yejin Choi on limits of LLMs, Fred Kaiser Building 2020/2030
Tu	Nov 7	Universal approximation	Telgarsky 2; SSBD 20; Bach 9.3; SC 4.6
Th	Nov 9	Finish approximation; Is ERM enough? [Online: at a workshop]
F	Nov 10	Assignment 3 due at midnight
Tu	Nov 14	No class: midterm break
Th	Nov 16	Stability, regularization, convex problems	SSBD 12-13, MRT 14
Tu	Nov 21	Stability, regularization, convex problems	SSBD 12-13, MRT 14
Th	Nov 23	(Stochastic) gradient descent	SSBD 14, Bach 5
M	Nov 27	Assignment 4 posted: pdf, tex
Tu	Nov 28	Nonconvex optimization, start neural tangent kernels
Th	Nov 30	Neural tangent kernels	Telgarsky 4, Bach 11.3
Tu	Dec 5	Implicit regularization	Bach 11.1
Th	Dec 7	Grab-bag
M	Dec 18	Final exam (in person, handwritten) — 1-3:30pm, ICCS 246
W	Dec 20	Assignment 4 due at midnight

Logistics

The course meets in person in Swing 210, with possible rare exceptions (e.g. if I get sick but can still teach, I'll move it online). Note that this room does not have a recording setup.

Grading scheme: 70% assignments, 30% final.

There will be four or five written assignments through the term; answers should be written in LaTeX, and handed in on Gradescope. There will also be a small number (one or two) of assignments that involve reading a paper, reacting to it, and poking at it slightly further; details to come.

Overview

The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities, margin bounds, stability. Also, most of: PAC-Bayes, analysis of kernel methods, limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

Prerequisites

There are no formal prerequisites. I will roughly assume:

Basic "mathematical maturity": familiarity with reading and writing proofs, recognizing a valid proof, etc. If you've taken a third-year math course or similar, you should be fine.
Comfort with linear algebra, multivariable calculus, basic probability theory, and basic analysis of algorithms.
Ideally, a basic understanding of machine learning, as in CPSC 340. If you don't have this, you should still be able to get by, but might have to do a little more reading on your own.
Basic mathematical programming ability in any language: being able to plot functions, etc.
Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This is not required, but there will some assignment options may be easier / more fruitful / more fun if you're comfortable with it.

If you have any specific questions about your background, feel free to ask.

Resources

Books that the course will definitely pull from:

[SSBD] Understanding Machine Learning: From Theory to Algorithms (Shai Shalev-Shwartz, Shai Ben-David; 2014) – a very readable (free) book that covers a significant portion of our material
[MRT] Foundations of Machine Learning (Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwakar; second edition 2018) – free textbook, a little more "graduate-oriented", that covers certain topics (especially Rademacher complexity) in more depth

New books where I may or may not pull from sections, TBD:

[Bach] Learning Theory from First Principles (Francis Bach; 2023 draft)
[Zhang] Mathematical Analysis of Machine Learning Algorithms (Tong Zhang; 2023 pre-publication version)

Some other points of view you might like:

Introduction to Statistical Learning Theory (Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi; 2003) – 40-page survey of classics
On the Mathematical Foundations of Learning (Felipe Cucker, Steve Smale; 2001) – 50-page survey of classics
An Introduction to Computational Learning Theory (Michael Kearns, Umesh Vazirani; 1994; that link should get you a copy with a UBC login) – a book from a much more CS theory point of view. A nice complement to what we'll cover in this course.

If you need to refresh your linear algebra or other areas of math:

Mathematics for Machine Learning (Marc Deisenroth, Aldo Faisal, Cheng Soon Ong; 2020) – a nice overview of basics.

Measure-theoretic probability is not required for this course, but there are instances and related areas where it could be helpful:

A Measure Theory Tutorial (Measure Theory for Dummies) (Maya Gupta) – 5 pages, just the basics
Measure Theory, 2010 (Greg Hjorth) – 110 pages but comes recommended as both thorough and readable
A Probability Path (Sidney Resnick) – frequently recommended textbook aimed at non-mathematicians to learn it in detail, but it's a full-semester textbook scale of detail; available if you log in via UBC
There are also lots of other places, of course; e.g. the probability textbooks by Billingsley, Klenke, and Williams are (I think) classics.

Similar courses:

UBC CPSC 531H: Machine Learning Theory (Nick Harvey, not taught for a while)
MIT 9.520: Statistical Learning Theory and Applications (Tomaso Poggio, Loreno Rosasco, Alexander Rakhlin, Ardrzej Banburski)
TTIC 31120: Computational and Statistical Learning Theory (Nati Srebro)
CMU 10-806: Foundations of Machine Learning and Data Science (Nina Balcan, Avrim Blum)
UIUC ECE 598MR: Statistical Learning Theory (Maxim Raginsky)
Stanford STATS214: Machine Learning Theory (Tengyu Ma)

CPSC 532D: Modern Statistical Learning Theory – Fall 2023 (2023W1)

Schedule

Logistics

Overview

Prerequisites

Resources