Projects for CPSC 532S: Modern Statistical Learning Theory – 2021W2

Specifications

Projects should be done in groups of at least 1, no more than 3. The project counts as one homework assignment (but can't be dropped).

10 points, due Wed March 16: a very short proposal (~1 paragraph, including links to a relevant paper or two), just posted as a private post on Piazza with me and your group. I'll give you feedback as quickly as I can. It's okay to change your topic after this if needed, but talk to me if it's a big change.
20 points, on Wed Apr 6: in-class presentation on the last day of class. This will be about 5-10 minutes long (strictly enforced), depending on the number of groups (exact length TBA). Explain the topic, any new results, give an overview of techniques, etc. It's preferable to come in person, but you can also present via Zoom.
70 points, due Fri Apr 8: project report, which should be in NeurIPS format. There is no hard lower limit on length, but I expect a successful project to be at least about 4 pages. 10 page cap; you can submit appendices of unlimited length if you feel you need them, but (like when I'm reviewing for NeurIPS) don't necessarily expect me to read them.

Project scope

Your paper should fit roughly into one of these three categories:

Literature survey: Read several related papers on a learning theory topic, and write a document that overviews their results and proof techniques, relates the problem settings to each other, etc. Please try not to just replicate some existing survey, or copy the related work section of a paper that's a "sink" in the citation graph: read a few original papers and give your understanding of how they relate.
Extension: Do a deep, critical read of one or two learning theory papers. Maybe run a few experiments checking whether it seems like their assumptions / conclusions hold in a practical setting or two, determine mathematically whether their assumptions hold in some interesting setting, think about whether their theorem really shows what their introduction claims it does. Possibly, weaken an assumption in the paper, prove an interesting corollary, etc. Write a document outlining the paper and its proof techniques, and describe your new results / discussion as appropriate.
Novel analysis: Analyze an ML algorithm / setting in a way that it hasn't been analyzed before. This analysis should be nontrivial; it can be based on things we've learned in class, or other techniques if needed. It's okay if this analysis doesn't work out if it seemed like a reasonable thing to do in the first place, and you can produce a document explaining why it didn't. But if you're taking this option, you should probably have a "backup plan."

Some suggestions

Here are a smattering of interesting paper ideas I happen to already know about. It's worth browsing through recent proceedings for COLT (2021, 2020) or ALT (2021, 2020); lots of good learning theory papers appear in other venues, but most papers in these venues are relevant to this course.

Most of these are from the last few years, but it's fine to do older papers too.

Self-supervised learning: theory of pretext tasks, a follow-up; an attempted explanation of contrastive learning; why more negatives don't always help

New models for generalization:via optimal transport; via conditional mutual information; Towards a Unified Information-Theoretic Framework for Generalization; Distributional Generalization

Ensembles:Assessing generalization via disagreement and "a note on" that paper; "deconstructing distributions"; estimating accuracy from unlabeled data

Meta-learning:Provable Guarantees for Gradient-Based Meta-Learning; A Closer Look at the Training Strategy for Modern Meta-Learning

A few disparate applications of kernel mean embeddings: e.g. Distributionally Robust Optimization and Generalization in Kernel Methods; Towards a Learning Theory of Cause-Effect Inference; Learning Theory for Distribution Regression (kind of heavy)

Learning mixtures: sample complexity of mixtures of Gaussians (UBC paper)

Algorithm configuration: this paper, for example; a perspective paper

Fairness and generalization: Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Stability of (S)GD: for SGD, followup 1, followup 2 (only partly about SGD)

Neural net generalization based on topology of learning paths

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

Lottery ticket hypothesis: the main paper; Stabilizing the Lottery Ticket Hypothesis; Linear Mode Connectivity and the Lottery Ticket Hypothesis; Pruning Neural Networks at Initialization: Why are We Missing the Mark?

Domain generalization:Invariant Risk Minimization, The Risks of Invariant Risk Minimization, Does Invariant Risk Minimization Capture Invariance? (extra points for sucking up if you pick this last one) _{(not really)}; Measuring Robustness to Natural Distribution Shifts in Image Classification; Understanding the Failure Modes of Out-of-Distribution Generalization; relationship to calibration

Neural Collapse: the paper, a criticism

Double descent: the survey, exploration in deep learning, tons of other followups

Relatedly, interpolation learning: follow citations from e.g. this paper; Does Learning Require Memorization?;

An Equivalence Between Private Classification and Online Prediction, but A Computational Separation between Private Learning and Online Learning

Fancy concentration inequalities for learning bounds

Research idea: Non-vacuous bounds and testing: try using various non-vacuous generalization bounds in a classifier two-sample test. Could definitely turn into a nice workshop paper / potentially a full paper with some effort, but can do a small version for the project; talk to me if you're interested.