I've given a "title" to each lecture, but the length of time I spent on each topic usually did not exactly equal 50 minutes. This means that most of the topics are spread across more or less than one lecture. This does not matter if you go through the lectures in order, but if you "skip" to a certain topic you may need to look at the lecture before/after and there may be material from the previous topic included.

I made the first version of these notes in 2015, but Mike Gelbart has also been teaching the course since 2016 and has made numerous improvements. Although I've never had my lectures for this course recorded, videos of the lectures from the Winter 2018 section of this course taught by Mike Gelbart are available here (the material is largely the same).

**Bonus Slides**: Many lectures include "bonus material", and *these slides have a different background colour* (orange in the case of the 340 slides). These slides cover tangential or more-advanced topics, and should probably be skipped if this is the first time you are seeing this material.
Also note that the lecture end on the slide titled "Summary", and typically all slides after this one only contain "bonus material" slides.

- Overview
- Exploratory Data Analysis
- Decision Trees (Notes on Big-O Notation)
- Fundamentals of Learning (Notation Guide)
- Probabilistic Classifiers (Probability Slides, Notes on Probability)
- Non-Parametric Models
- Ensemble Methods

- Linear Regression (Notes on Calculus, Notes on Linear Algebra, Notes on Linear/Quadratic Gradients)
- Nonlinear Regression
- Gradient Descent
- Robust Regression
- Feature Selection
- Regularization
- More Regularization
- Linear Classifiers
- More Linear Classifiers
- Feature Engineering
- Convolutions
- Kernel Trick
- Stochastic Gradient Descent
- Boosting
- MLE and MAP (Notes on Max and Argmax)

- Neural Networks
- Over-Parameterization
- Deep Neural Networks
- Convolutional Neural Networks
- Autoencoders and Multi-Label
- Fully-Convolutional Networks
- Recurrent Neural Networks
- LSTMs and Transformers
- What do we Learn?

- Structure Learning
- Sequence Mining
- Tensor Basics
- Semi-Supervised Learning
- PageRank
- Optimization Theory and Practice

- Binary Density Estimation
- Bernoulli Distribution
- MAP Estimation
- Generative Classifiers
- Discriminative Classifiers
- Neural Networks
- Double Descent Curves
- Automatic Differentiation
- Convolutional Neural Networks
- Autoencoders
- Fully-Convolutional Networks

- Monte Carlo Approximation
- Conjugate Priors
- Bayesian Learning
- Empirical Bayes
- Multi-Class Classification
- What do we learn?
- Recurrent Neural Networks
- Long Short Term Memory
- Attention and Transformers

- Univariate Gaussian
- Multivariate Gaussian (Motivation)
- Multivairate Gaussian (Definition)
- Learning Gaussians
- Bayesian Linear Regression
- End to End Learning
- Exponential Family

- Markov Chains
- Learning Markov Chains
- Message Passing
- Markov Chain Monte Carlo
- Directed Acyclic Graphical Models
- Learning Graphical Models
- Log-Linear Models

- Mixture Models
- EM and KDE (Notes on EM)
- HMMs and RBMs (Forward-Backward for HMMs)
- Topic Models and Variational Inference
- VAEs and GANs

- Fundamentals of Learning
- More Fundamentals of Learning
- Convexity (Notes on Norms)
- More Convexity
- How Much Data?
- Faster Algorithms for Deep Learning?
- Probabilistic PCA, Factor Analysis, Independent Component Analysis
- Inference in Graphical Models
- Structured SVMs
- Expectation Maximization
- Non-Parametric Bayes
- Infinite Mixture Models

- Convex Sets and Convex Functions (Notes on Norms)
- How many iterations of gradient descent do we need? (Notes on Convexity Inequalities, Notes on Implementing Gradient Descent)
- Momentum, acceleration, and second-order methods
- Coordinate optimization and stochastic gradient descent
- SGD with Constant Step Sizes, Growing Batches, and Over-Parameterization
- Variance reduction and 1.5-Order Methods
- Projected Gradient, Projected Newton, and Frank-Wolfe
- Global Optimization, Subgradients, and Cutting Planes
- Proximal-Gradient and Fenchel Duality
- Group Sparsity, Structured Regularization, and Kernel Methods
- Mirror Descent and Multi-Level Methods
- Online Learning

- Parallel and Distributed Machine Learning
- Online, Active, and Causal Learning
- Reinforcement Learning
- Overview of Other Large/Notable Topics

Mark Schmidt > Courses