CPSC 532D: Modern Statistical Learning Theory – 2022W1

There's a more recent version of this course.

Instructor: Danica Sutherland (she): dsuth@cs.ubc.ca, ICICS X563. TA: Milad Jalali.
Lecture info: Mondays/Wednesdays, 15:00 - 16:20, Orchard Commons 3004.
Office hours: Milad: Fridays 16:00-17:00, ICCS X150 table 3.
Piazza (or other link if that Canvas one doesn't work for whatever reason).

Previously offered in 2021W2 (under the name CPSC 532S); this instance will be broadly similar.

Schedule

Italicized entries are tentative. The books SSBD, MRT, and Tel are described further here.

Lecture notes are here and are irregularly updated as we go. N2 below refers to section 2 of the notes. (I might split the files later, but for now it's all one pdf.)

DateTopicMaterialSupplements
WSep 7Course intro; start finite hypothesis classesintro; N1-2SSBD 1-2; MRT 2
ThSep 8Assignment 1 posted: pdf, tex
MSep 12Class canceled (sick)
WSep 14PAC learning: definitions, finite hypothesis classesslides; N2SSBD 2-4; MRT 2
MSep 19No class: National day of mourning
TSep 20Assignment 1 due at noon
TSep 20Drop deadline
WSep 21Uniform convergence, concentration inequalitiesN2-3SSBD 4, B; MRT 2, D
Tel 12; Wainwright 2
SaSep 24Assignment 2 posted: pdf, tex
MSep 26More uniform convergence: Rademacher complexityN4MRT 3; SSBD 26; Tel 13
WSep 28More RademacherN4MRT 3; SSBD 26; Tel 13
MOct 3No Free Lunch Theorem; very start of VC dimension
[guest lecturer: Nick Harvey]
N5SSBD 5; MRT 3
WOct 5VC dimension
[guest lecturer: Nick Harvey]
N6.1-6.3SSBD 6; MRT 3
MOct 10No class: Thanksgiving
WOct 12Assignment 2 due at noon
WOct 12The fundamental theorem of statistical learningN6.4-6.5SSBD 6, 28; MRT 3
MOct 17Structural risk minimizationN7SSBD 7; MRT 4
WOct 19MDL, consistency, start of marginsN7-8SSBD 7
MOct 24Margins and SVMsN9MRT 5; SSBD 15, 26
WOct 26More SVMsN9MRT 5; SSBD 15, 26
FOct 28Assignment 3 posted: pdf, tex
FOct 28Withdrawal deadline
MOct 31Kernels I: setupMRT 6; SSBD 16
WNov 2Kernels II: Moore-Aronsazjn, representer theoremsMRT 6; SSBD 16
MNov 7Kernels III: Examples, algorithmsMRT 6; SSBD 16
TNov 8Assignment 3 due at 11:59pm
WNov 9No class: midterm break
MNov 14Kernels IV: regularization, operators
WNov 16Stability + convex learning problemsSSBD 12, 13
SuNov 20Assignment 4 posted: pdf, tex
SuNov 20Assignment 5 posted: pdf, tex
SuNov 20Paper Reading Assignment instructions posted
MNov 21(Stochastic) gradient descent analysisSSBD 14
WNov 23Finish SGD analysis; start implicit regularizationSSBD 14
MNov 28More implicit regularization / double descent; start NTK
[Online: at NeurIPS]
slidesBHMM / NKBYBS / Tel 10
WNov 30NTK
[Online: at NeurIPS]
slidesTel 4, 8
MDec 5Universality and generalization in deep learning
[Online: sick]
slidesTel 2, 14
WDec 7Grab bag: failures of uniform convergence;
PAC-Bayes; Online learning
[Online: sick]
slidesNK
SSBD 31 / Guedj
SSBD 21
FDec 9Paper reading assignment: proposal due by noon
SaDec 10Assignments 4 and 5 due at 11:59pm
WDec 14Final exam (in person in ICCS 246, handwritten) — 14:00 - 16:30
WDec 21Reading assignment due at noon

Logistics

The course meets in person in Orchard Commons 3004, with occasional rare exceptions.

Grading scheme: 70% assignments, 30% final.

The lowest assignment grade (not including the project) will be dropped. Assignments should be done in LaTeX – not handwritten or in a word processor. Hand-in on Gradescope, to be described in more detail soon.

One assignment, late in the term, will be a "mini-project" that will involve reading a paper and running a small exploratory experiment, doing a detailed exploration of their assumptions, etc. Suggested papers and details to come later.

Overview

The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities. Probably: PAC-Bayes, analysis of kernel methods, margin bounds, stability. Maybe: limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

Prerequisites

There are no formal prerequisites. I will roughly assume:

If you have any specific questions about your background, feel free to ask.

Resources

Books where we'll use sections in some detail:

Some other points of view you might like:

If you need to refresh your linear algebra or other areas of math:

Measure-theoretic probability is not required for this course, but there are instances and related areas where it could be helpful:

Similar courses: