Or ask after class, or on Piazza, or request another time (potential availability calendar if helpful).

Previously offered in in 2022W1 and (with the name 532S) in 2021W2; this instance will be broadly similar.

- SSBD is the easiest to understand but sometimes over-simplified (sometimes in ways that make it incorrect);
- MRT can be written in a more opaque way, but is at about the level of detail we mostly use;
- Bach is maybe similar to MRT, in my opinion easier to follow but has a somewhat different focus;
- Zhang is usually at a more advanced, and often more abstract, level than we mostly cover.

Date | Topic | Supplements | |
---|---|---|---|

Tu | Sep 5 | No class: Imagine Day | |

Th | Sep 7 | Course intro, ERM | SSBD 1-2; MRT 2 |

F | Sep 8 | Assignment 1 posted: pdf, tex | |

Tu | Sep 12 | Class canceled: sick | |

Th | Sep 14 | Uniform convergence with finite classes [Online: sick] | SSBD 2-4; MRT 2 |

M | Sep 18 | Assignment 1 due at noon | |

M | Sep 18 | Drop deadline | |

Tu | Sep 19 | Concentration inequalities | SSBD B; MRT D Zhang 2; Wainwright 2 |

Th | Sep 21 | PAC learning; covering numbers | SSBD 3, MRT 2 Bach 4.4.4, Zhang 3.4/4/5 |

Sa | Sep 23 | Assignment 2 posted: pdf, tex | |

Tu | Sep 26 | Rademacher complexity | MRT 3.1; SSBD 26; Bach 4.5; Zhang 6 |

Th | Sep 28 | ||

Tu | Oct 3 | VC dimension | SSBD 6; MRT 3.2-3.3 |

Th | Oct 5 | finish VC; No Free Lunch; “Fundamental Theorem” | SSBD 5; MRT 3.4 Bach 4.6 / 12; Zhang 12 |

Tu | Oct 10 | Structural Risk Minimization / Min Description Length | SSBD 7; MRT 4 |

W | Oct 11 | Assignment 2 due at midnight | |

Th | Oct 12 | No class: UBC follows a Monday schedule | |

Tu | Oct 17 | finish SRM, MDL; briefly start margins | |

Th | Oct 19 | Margins, SVMs | MRT 5; SSBD 15, 26 |

M | Oct 23 | Assignment 3 posted: pdf, tex | |

Tu | Oct 24 | More margins/SVMs | MRT 5; SSBD 15, 26 |

Th | Oct 26 | Kernels | Bach 7, MRT 6, SSBD 16 |

F | Oct 27 | Withdrawal deadline | |

Tu | Oct 31 | More kernels | |

Th | Nov 2 | Talk by Yejin Choi on limits of LLMs, Fred Kaiser Building 2020/2030 | |

Tu | Nov 7 | Universal approximation | Telgarsky 2; SSBD 20; Bach 9.3; SC 4.6 |

Th | Nov 9 | Finish approximation; Is ERM enough? [Online: at a workshop] | |

F | Nov 10 | Assignment 3 due at midnight | |

Tu | Nov 14 | No class: midterm break | |

Th | Nov 16 | Stability, regularization, convex problems | SSBD 12-13, MRT 14 |

Tu | Nov 21 | ||

Th | Nov 23 | (Stochastic) gradient descent | SSBD 14, Bach 5 |

M | Nov 27 | Assignment 4 posted: pdf, tex | |

Tu | Nov 28 | Nonconvex optimization, start neural tangent kernels | |

Th | Nov 30 | Neural tangent kernels | Telgarsky 4, Bach 11.3 |

Tu | Dec 5 | Implicit regularization | Bach 11.1 |

Th | Dec 7 | Grab-bag | |

M | Dec 18 | Final exam (in person, handwritten) — 1-3:30pm, ICCS 246 | |

W | Dec 20 | Assignment 4 due at midnight |

The course meets in person in Swing 210, with possible rare exceptions (e.g. if I get sick but can still teach, I'll move it online). Note that this room does not have a recording setup.

Grading scheme: 70% assignments, 30% final.

There will be four or five written assignments through the term; answers should be written in LaTeX, and handed in on Gradescope. There will also be a small number (one or two) of assignments that involve reading a paper, reacting to it, and poking at it slightly further; details to come.

The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities, margin bounds, stability. Also, most of: PAC-Bayes, analysis of kernel methods, limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

There are no formal prerequisites. I will roughly assume:

- Basic "mathematical maturity": familiarity with reading and writing proofs, recognizing a valid proof, etc. If you've taken a third-year math course or similar, you should be fine.
- Comfort with linear algebra, multivariable calculus, basic probability theory, and basic analysis of algorithms.
- Ideally, a basic understanding of machine learning, as in CPSC 340. If you don't have this, you should still be able to get by, but might have to do a little more reading on your own.
**Basic**mathematical programming ability in any language: being able to plot functions, etc.- Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This is
**not required**, but there will some assignment options may be easier / more fruitful / more fun if you're comfortable with it.

Books that the course will definitely pull from:

**[SSBD]**Understanding Machine Learning: From Theory to Algorithms (Shai Shalev-Shwartz, Shai Ben-David; 2014) – a very readable (free) book that covers a significant portion of our material**[MRT]**Foundations of Machine Learning (Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwakar; second edition 2018) – free textbook, a little more "graduate-oriented", that covers certain topics (especially Rademacher complexity) in more depth

New books where I may or may not pull from sections, TBD:

**[Bach]**Learning Theory from First Principles (Francis Bach; 2023 draft)**[Zhang]**Mathematical Analysis of Machine Learning Algorithms (Tong Zhang; 2023 pre-publication version)

Some other points of view you might like:

- Introduction to Statistical Learning Theory (Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi; 2003) – 40-page survey of classics
- On the Mathematical Foundations of Learning (Felipe Cucker, Steve Smale; 2001) – 50-page survey of classics
- An Introduction to Computational Learning Theory (Michael Kearns, Umesh Vazirani; 1994; that link should get you a copy with a UBC login) – a book from a much more CS theory point of view. A nice complement to what we'll cover in this course.

If you need to refresh your linear algebra or other areas of math:

- Mathematics for Machine Learning (Marc Deisenroth, Aldo Faisal, Cheng Soon Ong; 2020) – a nice overview of basics.

Measure-theoretic probability is *not* required for this course, but there are instances and related areas where it could be helpful:

- A Measure Theory Tutorial (Measure Theory for Dummies) (Maya Gupta) – 5 pages, just the basics
- Measure Theory, 2010 (Greg Hjorth) – 110 pages but comes recommended as both thorough and readable
- A Probability Path (Sidney Resnick) – frequently recommended textbook aimed at non-mathematicians to learn it in detail, but it's a full-semester textbook scale of detail; available if you log in via UBC
- There are also lots of other places, of course; e.g. the probability textbooks by Billingsley, Klenke, and Williams are (I think) classics.

Similar courses:

- UBC CPSC 531H: Machine Learning Theory (Nick Harvey, not taught for a while)
- MIT 9.520: Statistical Learning Theory and Applications (Tomaso Poggio, Loreno Rosasco, Alexander Rakhlin, Ardrzej Banburski)
- TTIC 31120: Computational and Statistical Learning Theory (Nati Srebro)
- CMU 10-806: Foundations of Machine Learning and Data Science (Nina Balcan, Avrim Blum)
- UIUC ECE 598MR: Statistical Learning Theory (Maxim Raginsky)
- Stanford STATS214: Machine Learning Theory (Tengyu Ma)