CPSC 532D: Modern Statistical Learning Theory

Schedule

Italicized entries are tentative. The books SSBD, MRT, and Tel are described further here.

Lecture notes are here and are irregularly updated as we go. N2 below refers to section 2 of the notes. (I might split the files later, but for now it's all one pdf.)

Date		Topic	Material	Supplements
W	Sep 7	Course intro; start finite hypothesis classes	intro; N1-2	SSBD 1-2; MRT 2
Th	Sep 8	Assignment 1 posted: pdf, tex
M	Sep 12	Class canceled (sick)
W	Sep 14	PAC learning: definitions, finite hypothesis classes	slides; N2	SSBD 2-4; MRT 2
M	Sep 19	No class: National day of mourning
T	Sep 20	Assignment 1 due at noon
T	Sep 20	Drop deadline
W	Sep 21	Uniform convergence, concentration inequalities	N2-3	SSBD 4, B; MRT 2, D Tel 12; Wainwright 2
Sa	Sep 24	Assignment 2 posted: pdf, tex
M	Sep 26	More uniform convergence: Rademacher complexity	N4	MRT 3; SSBD 26; Tel 13
W	Sep 28	More Rademacher	N4	MRT 3; SSBD 26; Tel 13
M	Oct 3	No Free Lunch Theorem; very start of VC dimension [guest lecturer: Nick Harvey]	N5	SSBD 5; MRT 3
W	Oct 5	VC dimension [guest lecturer: Nick Harvey]	N6.1-6.3	SSBD 6; MRT 3
M	Oct 10	No class: Thanksgiving
W	Oct 12	Assignment 2 due at noon
W	Oct 12	The fundamental theorem of statistical learning	N6.4-6.5	SSBD 6, 28; MRT 3
M	Oct 17	Structural risk minimization	N7	SSBD 7; MRT 4
W	Oct 19	MDL, consistency, start of margins	N7-8	SSBD 7
M	Oct 24	Margins and SVMs	N9	MRT 5; SSBD 15, 26
W	Oct 26	More SVMs	N9	MRT 5; SSBD 15, 26
F	Oct 28	Assignment 3 posted: pdf, tex
F	Oct 28	Withdrawal deadline
M	Oct 31	Kernels I: setup		MRT 6; SSBD 16
W	Nov 2	Kernels II: Moore-Aronsazjn, representer theorems		MRT 6; SSBD 16
M	Nov 7	Kernels III: Examples, algorithms		MRT 6; SSBD 16
T	Nov 8	Assignment 3 due at 11:59pm
W	Nov 9	No class: midterm break
M	Nov 14	Kernels IV: regularization, operators
W	Nov 16	Stability + convex learning problems		SSBD 12, 13
Su	Nov 20	Assignment 4 posted: pdf, tex
Su	Nov 20	Assignment 5 posted: pdf, tex
Su	Nov 20	Paper Reading Assignment instructions posted
M	Nov 21	(Stochastic) gradient descent analysis		SSBD 14
W	Nov 23	Finish SGD analysis; start implicit regularization		SSBD 14
M	Nov 28	More implicit regularization / double descent; start NTK [Online: at NeurIPS]	slides	BHMM / NKBYBS / Tel 10
W	Nov 30	NTK [Online: at NeurIPS]	slides	Tel 4, 8
M	Dec 5	Universality and generalization in deep learning [Online: sick]	slides	Tel 2, 14
W	Dec 7	Grab bag: failures of uniform convergence; PAC-Bayes; Online learning [Online: sick]	slides	NK SSBD 31 / Guedj SSBD 21
F	Dec 9	Paper reading assignment: proposal due by noon
Sa	Dec 10	Assignments 4 and 5 due at 11:59pm
W	Dec 14	Final exam (in person in ICCS 246, handwritten) — 14:00 - 16:30
W	Dec 21	Reading assignment due at noon

Logistics

The course meets in person in Orchard Commons 3004, with occasional rare exceptions.

Grading scheme: 70% assignments, 30% final.

The lowest assignment grade (not including the project) will be dropped. Assignments should be done in LaTeX – not handwritten or in a word processor. Hand-in on Gradescope, to be described in more detail soon.

One assignment, late in the term, will be a "mini-project" that will involve reading a paper and running a small exploratory experiment, doing a detailed exploration of their assumptions, etc. Suggested papers and details to come later.

Overview

The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities. Probably: PAC-Bayes, analysis of kernel methods, margin bounds, stability. Maybe: limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

Prerequisites

There are no formal prerequisites. I will roughly assume:

Basic "mathematical maturity": familiarity with reading and writing proofs, recognizing a valid proof, etc. If you've taken a third-year math course or similar, you should be fine.
Comfort with linear algebra, multivariable calculus, basic probability theory, and basic analysis of algorithms.
Ideally, a basic understanding of machine learning, as in CPSC 340. If you don't have this, you should still be able to get by, but might have to do a little more reading on your own.
Basic mathematical programming ability in any language: being able to plot functions, etc.
Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This is not required, but there will be some assignment and project options that may be easier / more fruitful / more fun if you're comfortable with it.

If you have any specific questions about your background, feel free to ask.

Resources

Books where we'll use sections in some detail:

[SSBD] Understanding Machine Learning: From Theory to Algorithms (Shai Shalev-Shwartz, Shai Ben-David; 2014) – a very readable (free) book that covers a significant portion of our material
[MRT] Foundations of Machine Learning (Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwakar; second edition 2018) – free textbook, a little more "graduate-oriented", that covers certain topics (especially Rademacher complexity) in more depth
[Tel] Deep learning theory lecture notes (Matus Telgarsky; ongoing updates) – overview including some quite modern stuff; we'll pull from here especially later in the course

Some other points of view you might like:

Introduction to Statistical Learning Theory (Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi; 2003) – 40-page survey of classics
On the Mathematical Foundations of Learning (Felipe Cucker, Steve Smale; 2001) – 50-page survey of classics
An Introduction to Computational Learning Theory (Michael Kearns, Umesh Vazirani; 1994; that link should get you a copy with a UBC login) – a book from a much more CS theory point of view. A nice complement to what we'll cover in this course.

If you need to refresh your linear algebra or other areas of math:

Mathematics for Machine Learning (Marc Deisenroth, Aldo Faisal, Cheng Soon Ong; 2020) – a nice overview of basics.

Measure-theoretic probability is not required for this course, but there are instances and related areas where it could be helpful:

A Measure Theory Tutorial (Measure Theory for Dummies) (Maya Gupta) – 5 pages, just the basics
Measure Theory, 2010 (Greg Hjorth) – 110 pages but comes recommended as both thorough and readable
A Probability Path (Sidney Resnick) – frequently recommended textbook aimed at non-mathematicians to learn it in detail, but it's a full-semester textbook scale of detail; available if you log in via UBC
There are also lots of other places, of course; e.g. the probability textbooks by Billingsley, Klenke, and Williams are (I think) classics.

Similar courses:

UBC CPSC 531H: Machine Learning Theory (Nick Harvey, not taught for a while)
MIT 9.520: Statistical Learning Theory and Applications (Tomaso Poggio, Loreno Rosasco, Alexander Rakhlin, Ardrzej Banburski)
TTIC 31120: Computational and Statistical Learning Theory (Nati Srebro)
CMU 10-806: Foundations of Machine Learning and Data Science (Nina Balcan, Avrim Blum)
UIUC ECE 598MR: Statistical Learning Theory (Maxim Raginsky)

CPSC 532D: Modern Statistical Learning Theory – 2022W1

Schedule

Logistics

Overview

Prerequisites

Resources