Previously offered in 23W1, 22W1, and (with the name 532S) 21W2; this instance will be broadly similar.

This is a course on the mathematical foundations of machine learning. When should we expect ML algorithms to work (or not work), and what kinds of assumptions do we need to make to rigorously prove this?

- SSBD (2014) is the easiest to understand but sometimes over-simplified (sometimes in ways that make it incorrect);
- MRT (second edition in 2018) can be written in a more opaque way, but is at about the level of detail we mostly use;
- Bach (draft as of 2024) is maybe similar to MRT, in my opinion easier to follow but has a somewhat different focus (more stats-y than CS-y);
- Zhang (2023) is usually at a more advanced, and often more abstract, level than we mostly cover.

Date | Topic | Supplements | |
---|---|---|---|

M | Sep 2 | No class: Labour Day | |

W | Sep 4 | Course intro, ERM | SSBD 1-2; MRT 2; Bach 2 |

Th | Sep 5 | Assignment 1 posted (pdf, tex) — loss functions, finite classes | |

M | Sep 9 | Uniform convergence with finite classes | SSBD 2-4; MRT 2 |

W | Sep 11 | Concentration inequalities | SSBD B; MRT D; Bach 1.2 Zhang 2; Wainwright 2 |

M | Sep 16 | Assignment 1 due at noon | |

M | Sep 16 | PAC learning; covering numbers | SSBD 3, MRT 2 Bach 4.4.4, Zhang 3.4/4/5 |

M | Sep 16 | Drop deadline | |

W | Sep 18 | Finish covering numbers; start Rademacher | |

M | Sep 23 | Rademacher complexity | MRT 3.1; SSBD 26 Bach 4.5; Zhang 6 |

M | Sep 23 | Assignment 2 posted (pdf, tex) — PAC, concentration, covering numbers | |

W | Sep 25 | Finish Rademacher | |

Su | Sep 29 | Assignment 3 posted (pdf, tex) — Rademacher, VC | |

M | Sep 30 | No class: National Day for Truth and Reconciliation | |

W | Oct 2 | VC dimension | SSBD 6; MRT 3.2-3.3 |

M | Oct 7 | Finish VC; Online learning | |

W | Oct 9 | More online learning; maybe start bandits | |

F | Oct 11 | Assignment 2 due at 11:59pm | |

M | Oct 14 | No class: Thanksgiving Day | |

W | Oct 16 | Bandits | |

M | Oct 21 | More bandits; maybe start no free lunch | |

W | Oct 23 | No free lunch / “fundamental theorem” | SSBD 5; MRT 3.4 Bach 4.6 / 12; Zhang 12 |

F | Oct 25 | Withdrawal deadline | |

F | Oct 25 | Assignment 3 due at 11:59pm | |

F | Oct 25 | Assignment 4 posted — online learning, bandits, no free lunch | |

M | Oct 28 | ||

W | Oct 30 | ||

M | Nov 4 | ||

W | Nov 6 | ||

F | Nov 8 | Assignment 4 due at 11:59pm | |

F | Nov 8 | Assignment 5 posted — topics TBD | |

F | Nov 8 | Paper-reading assignment: choice of papers posted | |

M | Nov 11 | No class: midterm break / Remembrance Day | |

W | Nov 13 | No class: midterm break | |

M | Nov 18 | ||

W | Nov 20 | ||

M | Nov 25 | ||

W | Nov 27 | ||

F | Nov 29 | Assignment 5 due at 11:59pm | |

M | Dec 2 | ||

W | Dec 4 | ||

? | Dec ?? | Final exam (in person, handwritten) — sometime Dec 10-21, likely Dec 16-21 to avoid NeurIPS | |

? | Dec ?? | Paper-reading assignment oral presentations, to be individually scheduled |

The course meets in person in Ponderosa Commons North Oak/Cedar House 1011, with possible rare exceptions (e.g. if I get sick but can still teach, I'll move it online). Note that this room does not have a recording setup, but if you need to miss class for some reason and would prefer some form of a janky recording to just reading the lecture notes, talk to me.

Grading scheme: 70% assignments, 30% final.

- There will be four or five assignments involving proving and exploring various things related to the course material through the term. These should be written in LaTeX and handed in on Gradescope; expect each one to take a significant amount of time. These will mostly be proofs / similar math and conceptual questions; there may occasionally be some light programming involved, or maybe not. Nothing major on the programming side, just exploring some concepts / running some experiments to see how it interacts with the math.
- There will be one or two assignments that involve reading a paper, reacting to it, poking at its assumptions slightly further, and possibly answering questions about it orally; details to come.
- The final will be a traditional handwritten, in-person exam.

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities, margin bounds, stability. Also, most of: PAC-Bayes, analysis of kernel methods, limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

There are no formal prerequisites.
**TL;DR:** if you've done well in CS 340/540 or 440/550, didn't struggle with the probability stuff there, and are also pretty comfortable with proofs, you'll be fine. If not, keep reading.

I will roughly assume the following; if you're missing one of them, you can probably get by, but if you're missing multiple, talk to me about it.

- Basic "mathematical maturity": familiarity with reading and writing proofs, recognizing a valid proof, etc. Math 220 or having succeeded at most third-year or up courses in math or theoretical CS/stat should be indicative of this being fine.
- Ideally, a basic understanding of machine learning, as in CS 340 or probably 330/related courses. If you don't have this, you should still be able to get by, but might have to do a little more reading on your own.
- Probability: basic comfort with manipulating probability statements, etc. Anyone having done well in CS 420, 436R/536N, 440/550, 532W, most graduate Stat classes, many graduate CS theory classes, etc should be fine. Ideally, Math 302/318/418 or Stat 302. Also probably should be fine: Stat 241/251, Econ 325/327.
- Linear algebra: ideally Math 310, but Math 152/221/223 are fine; anyone who's done well in CS 340/540 or 440/550 should probably be fine. We don't need anything special here, I'm just going to write matrix expressions and assume you know what's going on. Later in the course we will use a little bit of abstract vector spaces / Hilbert spaces, but we'll cover that if you haven't seen it before.
- Multivariate calculus: maybe Math 200/217/226/253/263; anyone who's done well in CS 340/540 or 440/550 should be fine. Nothing special here either, I'm just not going to explain what a gradient is.
- Analysis of algorithms: CS 221 (basic algorithms + data structures) should be fine to get by, though CS 421 (NP-completeness, etc) would be the best case. Any statisticians or mathematicians who haven't taken this, you should be fine.
**Basic**mathematical programming ability in any language: being able to plot functions, etc.- Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This is
**not required**, but there may be some options may be easier / more fruitful / more fun if you're comfortable with it.

- Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This is
- In the
*absolute best case*, real analysis and functional analysis (eg Math 320, 321, 420, 421). This is**very much not required**, and most people who've succeeded in the course haven't taken this, but we use a few tools from these fields. I'll explain what we need from them as we see them.

If you need to refresh your linear algebra or other areas of math:

- Mathematics for Machine Learning (Marc Deisenroth, Aldo Faisal, Cheng Soon Ong; 2020) – a nice overview of basics.
- Linear Algebra Done Right (Sheldon Axler, fourth edition 2024) – a great introduction towards the abstract sides of linear algebra; the book I used for self-study as a teenager.

In addition to the books above, some other points of view you might like:

- Mathematical Foundations of Machine Learning (Robert Nowak; 2022) – course notes.
- Deep learning theory lecture notes (Matus Telgarsky; 2021) – course notes.
- Introduction to Statistical Learning Theory (Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi; 2003) – 40-page survey of classics.
- On the Mathematical Foundations of Learning (Felipe Cucker, Steve Smale; 2001) – 50-page survey of classics.
- An Introduction to Computational Learning Theory (Michael Kearns, Umesh Vazirani; 1994; that link should get you a copy with a UBC login) – a book from a much more CS theory point of view. A nice complement to what we'll cover in this course.

Measure-theoretic probability is *not* required for this course, but there are instances and related areas where it could be helpful:

- A Measure Theory Tutorial (Measure Theory for Dummies) (Maya Gupta) – 5 pages, just the basics
- Measure Theory, 2010 (Greg Hjorth) – 110 pages but comes recommended as both thorough and readable
- A Probability Path (Sidney Resnick) – frequently recommended textbook aimed at non-mathematicians to learn it in detail, but it's a full-semester textbook scale of detail; available if you log in via UBC
- There are also lots of other places, of course; e.g. the probability textbooks by Billingsley, Klenke, and Williams are (I think) classics.

Similar courses:

- UBC CPSC 531H: Machine Learning Theory (Nick Harvey, not taught for a while)
- MIT 9.520: Statistical Learning Theory and Applications (Tomaso Poggio, Loreno Rosasco, Alexander Rakhlin, Ardrzej Banburski)
- TTIC 31120: Computational and Statistical Learning Theory (Nati Srebro)
- CMU 10-806: Foundations of Machine Learning and Data Science (Nina Balcan, Avrim Blum)
- UIUC ECE 598MR: Statistical Learning Theory (Maxim Raginsky)
- Stanford STATS214: Machine Learning Theory (Tengyu Ma)

The vast majority of you are grad students. One of the primary things about grad school is that the point of classes is to learn things; **grades are not the point**. If you're a PhD student, they're almost totally irrelevant to your life; if you're a master's student, they *barely* matter if at all. Cheating, or skirting the boundaries of cheating, does not assist with your learning, but it does potentially set you down a slippery slope towards *research* miscoduct, which undermines the whole enterprise of science. So, don't cheat; *talk to me* about what you're struggling with that's making you feel like you need to cheat, and we can figure something out. (Or just take the lower grade.)

You can read more about general UBC policies: Academic Honesty, Academic Misconduct, and Disciplinary Measures. You should also read the Student Declaration and Responsibility to which you've already implicitly agreed.

For the purposes of this class specifically:- On assignments, it is
**extremely not okay**to ask a friend or look online for old solution files (or otherwise use them if you come across them). It's also**not okay**to specifically seek out answers to a homework question online, since we'll sometimes use semi-standard questions. It**is okay**to look at general reference material on related topics, e.g. the materials linked on the course site. Sometimes in doing that, you might accidentally stumble across the solution to a problem. If so, that's okay; just do not copy it verbatim, make sure you fully understand what you're writing, and**say what happened**including a link or reference to the specific source. - It is also
**not okay**to share any assignment solution files we release, to friends, uploaded to websites, whatever. - It is
**not okay**to sit down with a classmate and write out solutions together, unless you're explicit assignment partners (not allowed on assignment 1). It**is okay**to talk about general ideas, strategies, and techniques with people in the class / others who might know things about the topics; if so, (a)**write up the solution alone, without notes, ensuring you fully understand it**, and (b)**say who you talked to and to what extent**. - When you do an assignment with a partner, it is fine either to work separately and then merge answers or to sit down and solve them together, but it is
**not okay**to just say “you do problem 1, I'll do problem 2.” By putting your name on an assignment you hand in, you are pledging that you fully understand and contributed at least a little to every part of what you hand in. If you end up doing part of the assignment together and part separately, then hand in two solutions and put a note on each question where it's relevant like “I did this problem with Alice.” - It is
**not okay**to use generative AI tools (ChatGPT, etc) on these homework problems,*except*that it's acceptable for pure content-neutral LaTeX help if you find that useful. - For the exam, standard exam rules apply: I'll tell you what written sources you can use, and no referring to any other materials, looking over someone's shoulder, going to the washroom and instead running to a friend's office to google something, etc.
- If you're not sure what's allowed and what's not, ask!

Penalties for academic dishonesty can be quite severe (including failing the course or being expelled from the program), and the process is *very unpleasant* for everyone involved. Please don't.

This course, including class / Piazza / office hours / etc, is a positive space both in the specific sense linked there and also in that I truly hope everyone can feel safe and supported in the class. If anyone other than me is making you feel uncomfortable in any way, please let me know immediately. If I'm the one causing the issue and you would rather speak to someone other than me, please immediately talk to your departmental advisor or a CS department head (currently Margo Seltzer and Joanna McGrenere; email `head@cs.ubc.ca`.)