I've given a "title" to each lecture, but the length of time I spent on each topic usually did not exactly equal 50 minutes. This means that most of the topics are spread across more or less than one lecture. This does not matter if you go through the lectures in order, but if you "skip" to a certain topic you may need to look at the lecture before/after and there may be material from the previous topic included.

Although I made the first version of these notes in 2015, Mike Gelbart has also been teaching the course since 2016 and has made numerous improvements. Note that many lectures include "bonus material", and these slides have a different background colour. These slides cover tangential or more-advanced topics, and should probably be skipped if this is the first time you are seeing this material.

Although I've never had my lectures for this course recorded, videos of the lectures from the Winter 2018 section of this course taught by Mike Gelbart are available here (the material is largely the same).

- Overview
- Exploratory Data Analysis
- Decision Trees (Notes on Big-O Notation)
- Fundamentals of Learning (Notation Guide)
- Probabilistic Classifiers (Probability Slides, Notes on Probability)
- Non-Parametric Models
- Ensemble Methods

- Linear Regression (Notes on Calculus, Notes on Linear Algebra, Notes on Linear/Quadratic Gradients)
- Nonlinear Regression
- Gradient Descent
- Robust Regression
- Feature Selection
- Regularization
- More Regularization
- Linear Classifiers
- More Linear Classifiers
- Multi-Class Linear Classifiers
- Kernel Methods
- Stochastic Gradient
- Maximum Likehood (Notes on Max and Argmax)
- MAP Estimation

- Principal Component Analysis
- More PCA
- Sparse Matrix Factorization
- Recommender Systems
- Nonlinear Dimensionality Reduction

- Neural Networks
- More Neural Networks
- Even More Neural Networks
- Convolutional Neural Networks
- More CNNs, Boosting

Videos covering the first month of material in the 2016 offering are available here. Note that the material has gone through some substantial improvement since then.

- Gradient Descent Convergence
- Rates of Convergence (Notes on Convexity Inequalities, Notes on Implementing Gradient Descent)
- Subgradients
- Proximal Gradient
- Structured Regularization
- Coordinate Optimization
- Stochastic Subgradient
- SGD Convergence Rate
- Stochastic Average Gradient
- Kernel Methods and Fenchel Duality

- Density Estimation
- Multivariate Gausians
- Mixture Models
- Expectation Maximization (Notes on EM)
- Kernel Density Estimation
- Probabilistic PCA, Factor Analysis, Independent Component Analysis

- Markov Chains
- Monte Carlo Methods
- Message Passing
- Hidden Markov Models
- DAG Models
- More DAGs
- Undirected Graphical Models
- Approximate Inference
- Log-Linear Models
- Boltzmann Machines

- Conditional Random Fields
- Structured SVMs
- Deep Structured Models
- Fully-Convolutional Networks
- Recurrent Neural Networks
- Long Short Term Memory

- Bayesian Statistics
- Empirical Bayes
- Hierarchical Bayes
- Topics Models
- More Approximate Inference
- Non-Parametric Bayes
- VAEs and GANs

- Parallel and Distributed Machine Learning
- Online, Active, and Causal Learning
- Reinforcement Learning
- Overview of Other Large/Notable Topics

Mark Schmidt > Courses