Projects should be done in groups of at least 1, no more than 3. The project counts as one homework assignment (but can't be dropped).

- 10 points, due Wed March 16: a
**very short proposal**(~1 paragraph, including links to a relevant paper or two), just posted as a private post on Piazza with me and your group. I'll give you feedback as quickly as I can. It's okay to change your topic after this if needed, but talk to me if it's a big change. - 20 points, on Wed Apr 6:
**in-class presentation**on the last day of class. This will be about 5-10 minutes long (strictly enforced), depending on the number of groups (exact length TBA). Explain the topic, any new results, give an overview of techniques, etc. It's preferable to come in person, but you can also present via Zoom. - 70 points, due Fri Apr 8:
**project report**, which should be in NeurIPS format. There is no hard lower limit on length, but I expect a successful project to be at least about 4 pages. 10 page cap; you can submit appendices of unlimited length if you feel you need them, but (like when I'm reviewing for NeurIPS) don't necessarily expect me to read them.

Your paper should fit roughly into one of these three categories:

**Literature survey**: Read several related papers on a learning theory topic, and write a document that overviews their results and proof techniques, relates the problem settings to each other, etc. Please try not to just replicate some existing survey, or copy the related work section of a paper that's a "sink" in the citation graph: read a few original papers and give your understanding of how they relate.**Extension**: Do a deep, critical read of one or two learning theory papers. Maybe run a few experiments checking whether it seems like their assumptions / conclusions hold in a practical setting or two, determine mathematically whether their assumptions hold in some interesting setting, think about whether their theorem really shows what their introduction claims it does. Possibly, weaken an assumption in the paper, prove an interesting corollary, etc. Write a document outlining the paper and its proof techniques, and describe your new results / discussion as appropriate.**Novel analysis**: Analyze an ML algorithm / setting in a way that it hasn't been analyzed before. This analysis should be nontrivial; it can be based on things we've learned in class, or other techniques if needed. It's okay if this analysis doesn't work out if it seemed like a reasonable thing to do in the first place, and you can produce a document explaining*why*it didn't. But if you're taking this option, you should probably have a "backup plan."

Here are a smattering of interesting paper ideas I happen to already know about. It's worth browsing through recent proceedings for COLT (2021, 2020) or ALT (2021, 2020); lots of good learning theory papers appear in other venues, but most papers in these venues are relevant to this course.

Most of these are from the last few years, but it's fine to do older papers too.

**Self-supervised learning:** theory of pretext tasks, a follow-up; an attempted explanation of contrastive learning; why more negatives don't always help

**New models for generalization:**via optimal transport; via conditional mutual information; Towards a Unified Information-Theoretic Framework for Generalization; Distributional Generalization

**Ensembles:**Assessing generalization via disagreement and "a note on" that paper; "deconstructing distributions"; estimating accuracy from unlabeled data

**Meta-learning:**Provable Guarantees for Gradient-Based Meta-Learning; A Closer Look at the Training Strategy for Modern Meta-Learning

A few disparate **applications of kernel mean embeddings:** e.g. Distributionally Robust Optimization and Generalization in Kernel Methods; Towards a Learning Theory of Cause-Effect Inference; Learning Theory for Distribution Regression (kind of heavy)

**Learning mixtures:** sample complexity of mixtures of Gaussians (UBC paper)

**Algorithm configuration:** this paper, for example; a perspective paper

**Fairness and generalization:** Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

**Stability of (S)GD:** for SGD, followup 1, followup 2 (only partly about SGD)

Neural net generalization based on topology of learning paths

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

**Lottery ticket hypothesis:** the main paper; Stabilizing the Lottery Ticket Hypothesis; Linear Mode Connectivity and the Lottery Ticket Hypothesis; Pruning Neural Networks at Initialization: Why are We Missing the Mark?

**Domain generalization:**Invariant Risk Minimization, The Risks of Invariant Risk Minimization, Does Invariant Risk Minimization Capture Invariance? (extra points for sucking up if you pick this last one) _{(not really)}; Measuring Robustness to Natural Distribution Shifts in Image Classification; Understanding the Failure Modes of Out-of-Distribution Generalization; relationship to calibration

**Neural Collapse:** the paper, a criticism

**Double descent:** the survey, exploration in deep learning, tons of other followups

Relatedly, **interpolation learning:** follow citations from e.g. this paper; Does Learning Require Memorization?;

An Equivalence Between Private Classification and Online Prediction, but A Computational Separation between Private Learning and Online Learning

Fancy concentration inequalities for learning bounds

**Research idea: Non-vacuous bounds and testing:** try using various non-vacuous generalization bounds in a classifier two-sample test. Could definitely turn into a nice workshop paper / potentially a full paper with some effort, but can do a small version for the project; talk to me if you're interested.