CPSC 532D: Statistical Learning Theory

Previously offered in 24W1, 23W1, 22W1, and (with the name 532S) 21W2; this instance will be broadly similar.

This is a course on the mathematical foundations of machine learning. When should we expect ML algorithms to work (or not work), and what kinds of assumptions do we need to make to rigorously prove this?

Schedule

Italicized entries are tentative. The lecture notes are self-contained, but the supplements column also refers to the following books (all available as free pdfs) for more details / other perspectives:

SSBD (2014) is the easiest to understand but sometimes over-simplified (sometimes in ways that make it incorrect);
MRT (second edition in 2018) can be written in a more opaque way, but is at about the level of detail we mostly use;
Bach (2024) is maybe similar to MRT, in my opinion easier to follow but has a somewhat different focus (more stats-y than CS-y);
Zhang (2023) is usually at a more advanced, and often more abstract, level than we mostly cover.

Lecture notes are available as individual chapters linked below, or as one big frequently-updated file.

Date		Topic	Supplemental Reading
M	Sep 1	No class: Labour Day
W	Sep 3	Course intro, ERM	SSBD 1-2; MRT 2; Bach 2
W	Sep 3	Assignment 1 posted — loss functions, ERM, background
M	Sep 8	Uniform convergence: finite classes	SSBD 2-4; MRT 2; Bach 4.4
W	Sep 10	Concentration inequalities	SSBD B; MRT D; Bach 1.2 Zhang 2; Wainwright 2
M	Sep 15	Covering numbers	Bach 4.4.4; SSBD 27 Zhang 3.4/4/5
M	Sep 15	Assignment 1 due at 11:59pm
M	Sep 15	Drop deadline
M	Sep 15	Assignment 2 posted — finite classes, concentration, covering numbers
W	Sep 17	Rademacher complexity	MRT 3.1; SSBD 26 Bach 4.5; Zhang 6
M	Sep 22	More Rademacher
W	Sep 24	VC dimension	SSBD 6; MRT 3.2-3.3
M	Sep 29	No free lunch	SSBD 5; MRT 3.4 Bach 4.6/12; Zhang 12
W	Oct 1	PAC learning, the “fundamental theorem”	SSBD 5
F	Oct 3	Assignment 2 due at 11:59pm
F	Oct 3	Assignment 3 posted — Rademacher, VC, PAC
M	Oct 6	Online learning	SSBD 21; MRT 8
W	Oct 8	More online learning
M	Oct 13	No class: Thanksgiving Day
W	Oct 15	Approximation error: structural risk minimization, min description length	SSBD 7; MRT 4
M	Oct 20	More approximation error: universal approximators	Telgarsky 2; SSBD 20 Bach 9.3; SC 4.6
W	Oct 22	Kernels	Bach 7, MRT 6, SSBD 16
F	Oct 24	Assignment 3 due at 11:59pm
F	Oct 24	Withdrawal deadline
F	Oct 24	Assignment 4 posted — online, SRM
M	Oct 27	More kernels
W	Oct 29	Margins
M	Nov 3	More margins
W	Nov 5	Is ERM enough?; optimization
F	Nov 7	Assignment 4 due at 11:59pm
F	Nov 7	Paper-reading assignment: choice of papers posted
M	Nov 10	No class: midterm break
W	Nov 12	No class: midterm break
F	Nov 14	Assignment 5 posted — kernels, margins
M	Nov 17	More optimization
W	Nov 19	Neural tangent kernels
M	Nov 24	Implicit regularization
W	Nov 26	Stability
M	Dec 1	PAC-Bayes
?	?	Paper-reading assignment: self-scheduled appointment availability tba, maybe during Nov 20 — Dec 1 or similar
W	Dec 3	Grab-bag/wrap-up, or possibly canceled for NeurIPS
F	Dec 5	Assignment 5 due at 11:59pm
?	Dec ??	Final exam (in person, handwritten); date/time tbd but will be during Dec 9-20 inclusive

Logistics

The course meets in person in Swing 206, with possible rare exceptions (e.g. if I get sick but can still teach, I'll move it online). Note that this room does not have a recording setup, but if you need to miss class for some reason and would prefer some form of a janky recording to just reading the lecture notes, talk to me.

Grading scheme: 70% assignments, 30% final.

There will be four or five assignments involving proving and exploring various things related to the course material through the term. These should be written in LaTeX and handed in on Gradescope; expect each one to take a significant amount of time. These will mostly be proofs / similar math and conceptual questions; there may occasionally be some light programming involved, or maybe not. Nothing major on the programming side, just exploring some concepts / running some experiments to see how it interacts with the math.
There will be one or two assignments that involve reading a paper, reacting to it, poking at its assumptions slightly further, and probably answering questions about it orally; details to come.
The final will be a traditional handwritten, in-person exam.

Registration info

There may be a waiting list for this course. My strong expectation and hope, based on every previous time I have taught it, is that everyone who wants to take the course will get into it; typically enough people drop, and if not we can expand the section size somewhat. If you're interested in the course, come to class even if you haven't officially been admitted into it yet.

Auditors are welcome, assuming . Please formally audit the course unless there's a reason you can't; talk to me if so.

Undergraduates are also welcome if they feel they meet the informal prerequisites; several undergraduates have taken, enjoyed, and done well in previous offerings of this course. There is a form to fill out; see instructions here. Typically, this process is entirely straightforward (but takes some time) if you've done at least 75% of the 300- and 400-level credits required for your degree, have an average of at least 76% in them, and have taken at most one previous graduate course for credit. If you don't fall in those situations, talk to me.

Prerequisites

There are no formal prerequisites. TL;DR: if you've done well in CPSC 340/540 or 440/550, didn't struggle with the probability stuff there, and are also pretty comfortable with proofs, you'll be fine. If not, keep reading.

I will roughly assume the following; if you're missing one of them, you can probably get by, but if you're missing multiple, talk to me about it.

Basic "mathematical maturity": familiarity with reading and writing proofs, recognizing a valid proof, etc. Math 220/226 is good, as is having taken a proofs-based math course at non-UBC places (typically analysis / abstract algebra / some linear algebra courses, but this will vary); CPSC 121, maybe third-year or up theoretical CS / stat courses, etc are all probably also fine. You should probably know roughly the content covered in, say, PLP: an introduction to mathematical proof or the Book of Proof.
Ideally, a basic understanding of machine learning, as in CPSC 340, CPEN 355, or probably CPSC 330/related courses. If you don't have this, you should still be able to get by, but might have to do a little more reading on your own.
Probability: basic comfort with manipulating probability statements, etc. Anyone having done well in CPSC 420, 436R/536N, 440/550, 532W, most graduate Stat classes, many graduate CS theory classes, etc should be fine. Ideally, Math 302/318/418 or Stat 302. Also probably should be fine: Stat 241/251, Econ 325/327.
Linear algebra: ideally Math 310, but Math 152/221/223 are fine; anyone who's done well in CPSC 340/540 or 440/550 should probably be fine. We don't need anything special here, I'm just going to write matrix expressions and assume you know what's going on. Later in the course we will use a little bit of abstract vector spaces / Hilbert spaces, but we'll cover that if you haven't seen it before.
Multivariate calculus: maybe Math 200/217/226/253/263; anyone who's done well in CPSC 340/540 or 440/550 should be fine. Nothing special here either, I'm just not going to explain what a gradient is.
Analysis of algorithms: CPSC 221 (basic algorithms + data structures) should be fine to get by, though 421 (NP-completeness, etc) would be the best case. Any statisticians or mathematicians who haven't taken this, you should be fine.
Basic mathematical programming ability in any language: being able to plot functions, etc.
- Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This is not required, but there may be some options may be easier / more fruitful / more fun if you're comfortable with it.
In the absolute best case, real and functional analysis (eg Math 320, 321, 420, 421). This is very much not required, and most people who've succeeded in the course haven't taken this, but we use a few tools from these fields. I'll explain what we need from them as we see them.

If you have any specific questions about your background, please ask.

Resources

If you need to refresh your linear algebra or other areas of math:

Mathematics for Machine Learning (Marc Deisenroth, Aldo Faisal, Cheng Soon Ong; 2020) – a nice overview of basics.
Linear Algebra Done Right (Sheldon Axler, fourth edition 2024) – a great introduction towards the abstract sides of linear algebra; the book I used for self-study as a teenager.

In addition to the books above, some other points of view you might like:

Mathematical Foundations of Machine Learning (Robert Nowak; 2022) – course notes.
Deep learning theory lecture notes (Matus Telgarsky; 2021) – course notes.
Introduction to Statistical Learning Theory (Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi; 2003) – 40-page survey of classics.
On the Mathematical Foundations of Learning (Felipe Cucker, Steve Smale; 2001) – 50-page survey of classics.
An Introduction to Computational Learning Theory (Michael Kearns, Umesh Vazirani; 1994; that link should get you a copy with a UBC login) – a book from a much more CS theory point of view. A nice complement to what we'll cover in this course.

Measure-theoretic probability is not required for this course, but there are instances and related areas where it could be helpful:

A Measure Theory Tutorial (Measure Theory for Dummies) (Maya Gupta) – 5 pages, just the basics
Measure Theory, 2010 (Greg Hjorth) – 110 pages but comes recommended as both thorough and readable
A Probability Path (Sidney Resnick) – frequently recommended textbook aimed at non-mathematicians to learn it in detail, but it's a full-semester textbook scale of detail; available if you log in via UBC
There are also lots of other places, of course; e.g. the probability textbooks by Billingsley, Klenke, and Williams are (I think) classics.

Some similar courses:

UBC CPSC 531H: Machine Learning Theory (Nick Harvey, not taught for a while)
MIT 9.520: Statistical Learning Theory and Applications (Tomaso Poggio, Loreno Rosasco, Alexander Rakhlin, Ardrzej Banburski)
TTIC 31120: Computational and Statistical Learning Theory (Nati Srebro)
CMU 10-806: Foundations of Machine Learning and Data Science (Nina Balcan, Avrim Blum)
UIUC ECE 598MR: Statistical Learning Theory (Maxim Raginsky)
Stanford STATS214: Machine Learning Theory (Tengyu Ma)

Policies

Academic integrity

The vast majority of you are grad students. The point of grad school classes is to learn things; grades are not the point. For PhD students, they're almost totally irrelevant to your life; for master's students, they barely matter if at all. Cheating, or skirting the boundaries of cheating, does not assist with your learning, but it does potentially set you down a slippery slope towards research miscoduct, which undermines the whole enterprise of science. So, don't cheat; talk to me about what you're struggling with that's making you feel like you need to cheat, and we can figure something out. (Or just take the lower grade.)

You can read more about general UBC policies: Academic Honesty, Academic Misconduct, and Disciplinary Measures. You should also read the Student Declaration and Responsibility to which you've already implicitly agreed.

For the purposes of this class specifically:

Do not ask friends or look online for old solution files. If you somehow happen across one, don't read it, and let me know right away how you came by it.
Do not share any assignment solution files, whether the ones we release or your own writeups. Sharing includes giving them to friends, uploading them to public websites, etc.
- If you think you did something especially cool/nonstandard and want to show it off, it's definitely okay to do that with people currently in the class or who have taken it already. If you want to use it as a part of something else, talk to me first!
Do not use generative AI tools (ChatGPT/etc) to answer assignment questions.
- It is okay to use these tools for pure content-neutral help with LaTeX formatting, or Grammarly-type tools for grammar/etc. It is not okay to use them in any content-dependent way that has to do with your actual proof/etc. If you're not sure, ask.
Do not otherwise specifically seek out answers to the assignment questions; there will be a few that are available in solution manuals, textbooks, papers, etc if you go looking, so don't specifically go looking.
You can look at general reference materials, e.g. the materials linked on the course site or just looking up topics we're covering. If you happen across a problem solution when you're doing that, that's okay; just don't copy it, write the solution yourself, and say what happened including a link/reference to the specific source. You won't lose points for doing this, but try not to have it happen!
You can discuss general ideas, strategies, techniques etc. of the assignments with peers.
- If you do, write up the solution alone without notes and ensuring you fully understand it.
- If you do, put a little note in your solution about who you talked to and to what extent. (You won't lose points for this, please tell me!)
- Do not sit down and write out the exact solution together unless you're an official assignment partner (which is not allowed on assignment 1).
When you have an explicit assignment partner:
- It's fine to either sit down and solve things together, or to do things separately and then merge answers.
- Do not say “you do problem 1, I'll do problem 2.” By putting your name on an assignment, you are pledging that you fully understand and contributed at least in part to everything you hand in.
- If you end up doing part of the assignment together and part separately, then hand in separate solution files and put a note on each question where it's relevant like “I did this problem with Alice.”
For the exam, standard closed-book exam rules apply:
- I'll tell you what notes you can bring to the exam.
- No referring to any other materials, looking over someone's shoulder, “going to the washroom” and instead running to a friend's computer to google something, etc.
If you're not sure what's allowed and what's not, ask!

Penalties for academic dishonesty can be quite severe (including failing the course or being expelled from the program), and I can say from experience that the process is very unpleasant for everyone involved. Please just don't; talk to me instead, and we can work something out.

Positive Space

This course, including class / Piazza / office hours / etc, is a positive space both in the specific sense linked there and also in that I truly hope everyone can feel safe and supported in the class. If anyone is making you feel uncomfortable in any way, please let me know immediately. If I'm the one causing the issue or you would otherwise rather speak to someone other than me, please immediately talk to your departmental advisor or a CS department head (currently Margo Seltzer and Joanna McGrenere; email head@cs.ubc.ca.)

CPSC 532D: Statistical Learning Theory – Fall 2025 (25W1)