I've given a "title" to each lecture, but the length of time I spent on each topic usually did not exactly equal 50 minutes. This means that most of the topics are spread across more or less than one lecture. This does not matter if you go through the lectures in order, but if you "skip" to a certain topic you may need to look at the lecture before/after and there may be material from the previous topic included.

I made the first version of these notes in 2015, but Mike Gelbart has also been teaching the course since 2016 and has made numerous improvements. Although I've never had my lectures for this course recorded, videos of the lectures from the Winter 2018 section of this course taught by Mike Gelbart are available here (the material is largely the same).

**Bonus Slides**: Many lectures include "bonus material", and *these slides have a different background colour* (orange in the case of the 340 slides). These slides cover tangential or more-advanced topics, and should probably be skipped if this is the first time you are seeing this material.
Also note that the lecture end on the slide titled "Summary", and typically all slides after this one only contain "bonus material" slides.

- Overview
- Exploratory Data Analysis
- Decision Trees (Notes on Big-O Notation)
- Fundamentals of Learning (Notation Guide)
- Probabilistic Classifiers (Probability Slides, Notes on Probability)
- Non-Parametric Models
- Ensemble Methods

- Least Squares (Notes on Calculus, Notes on Linear Algebra, Notes on Linear/Quadratic Gradients)
- Nonlinear Regression
- Gradient Descent
- Robust Regression
- Feature Selection
- Regularization
- More Regularization
- Linear Classifiers
- More Linear Classifiers
- Feature Engineering
- Convolutions
- Kernel Methods
- Stochastic Gradient
- Boosting
- MLE and MAP (Notes on Max and Argmax)

- Principal Component Analysis
- More PCA
- Sparse Matrix Factorization
- Recommender Systems
- Nonlinear Dimensionality Reduction

- Binary Density Estimation
- Bernoulli Distribution
- MAP Estimation
- Generative Classifiers
- Discriminative Classifiers
- Neural Networks
- Double Descent Curves
- Automatic Differentiation
- Convolutional Neural Networks
- Autoencoders
- Fully-Convolutional Networks

- Monte Carlo Approximation
- Conjugate Priors
- Bayesian Learning
- Empirical Bayes
- Multi-Class Classification
- What do we learn?
- Recurrent Neural Networks
- Long Short Term Memory
- Attention and Transformers

- Univariate Gaussian
- Multivariate Gaussian (Motivation)
- Multivairate Gaussian (Definition)
- Learning Gaussians
- Bayesian Linear Regression
- End to End Learning
- Exponential Family

- Markov Chains
- Learning Markov Chains
- Message Passing
- Markov Chain Monte Carlo
- Directed Acyclic Graphical Models
- Learning Graphical Models
- Log-Linear Models

- Mixture Models
- EM and KDE (Notes on EM)
- HMMs and RBMs (Forward-Backward for HMMs)
- Topic Models and Variational Inference
- VAEs and GANs

- Fundamentals of Learning
- More Fundamentals of Learning
- Convexity (Notes on Norms)
- More Convexity
- How Much Data?
- Faster Algorithms for Deep Learning?
- Probabilistic PCA, Factor Analysis, Independent Component Analysis
- Inference in Graphical Models
- Structured SVMs
- Expectation Maximization
- Non-Parametric Bayes
- Infinite Mixture Models

- Convex Optimization (Notes on Norms)
- Gradient Descent Progress (Notes on Convexity Inequalities, Notes on Implementing Gradient Descent)
- Gradient Descent Convergence
- Linear and Superlinear Convergence
- Subgradient Methods
- Projected-Gradient
- Proximal-Gradient
- Structured Regularization
- Coordinate Optimization
- Mirror Descent and Multi-Level Methods
- Randomized Algorithms
- Stochastic Subgradient
- Variance-Reduced Stochastic Gradient
- Kernel Methods and Fenchel Duality
- Online Learning
- Over-Parameterized Models

- Parallel and Distributed Machine Learning
- Online, Active, and Causal Learning
- Reinforcement Learning
- Overview of Other Large/Notable Topics

Mark Schmidt > Courses