The notation is fairly consistent across the topics which makes it easier to see relationships, and the topics are meant to be gone through *in order* (with the difficulty slowly increasing and concepts being defined at their first occurrence).

The first set of notes is mainly from the September-December 2017 version of CPSC 340, an undergraduate-level course on machine learning and data mining. Related readings and assignments are available from the course homepage. In the relevant places, I've also included some lectures from previous terms in cases where I covered different topics. Major changes since the 2016 version of the course include many improvements made by Mike Gelbart when he taught the course, and updating the slides to use a different colour for more-advanced or tangential "bonus material".

- Overview
- Exploratory Data Analysis
- Decision Trees (Notes on Big-O Notation)
- Fundamentals of Learning (Notation Guide)
- Probabilistic Classifiers (Notes on Probability, Probability Slides)
- Non-Parametric Models
- Ensemble Methods

- Least Squares
- Normal Equations (Notes on Linear Algebra, Notes on Linear/Quadratic Gradients)
- Gradient Descent
- Nonlinear Regression
- Feature Selection
- Regularization
- More Regularization
- Linear Classifiers
- More Linear Classifiers
- Kernel Trick
- Stochastic Gradient
- Multi-Class Classification
- MLE and MAP (Notes on Max and Argmax)

- Principal Component Analysis
- More PCA
- Sparse Matrix Factorization
- Recommender Systems
- Multi-Dimensional Scaling

- Matrix Notation
- MAP Estimation
- Minimizing Maxes of Linear Functions
- Convex Functions (Notes on Norms)
- Gradient Descent Convergence Rate
- Gradient Descent for Logistic Regression
- Practical Issues and Newton-Like Methods (Notes on Implementing Gradient Descent)
- How hard is optimization?
- Weaker Assumptions for Linear Convergence
- L1-Regularization and Coordinate Optimization (Notes on Convexity Inequalities)
- Group Sparsity
- Projected Gradient
- Proximal Gradient
- Structured Sparsity
- Stochastic Subgradient
- Convergence Rate
- Practical Subgradient Methods
- Stochastic Average Gradient
- Kernel Methods
- Valid Kernels and Representer Theorem
- Fenchel Duality
- Large-Scale Kernel Methods

- Density Estimation
- Univariate Gaussian
- Multivariate Gaussian
- Mixture Models
- Learning with Hidden Values
- Expectation Maximization
- Monotonicity of EM
- Kernel Density Estimation
- Factor Analysis (Notes on EM)
- Independent Component Analysis
- Markov Chains
- Monte Carlo Methods
- Message Passing

- Directed Acyclic Graphical Models
- D-Separation
- D-Separation and Plate Notation
- Learning and Inference in DAGs
- Undirected Graphical Models
- Complexity of Inference in Graphical Models
- ICM and Gibbs Sampling
- Variational Inference
- Block Approximate Inference
- Hidden Markov Models
- Boltzmann Machines
- Log-Linear Models
- Structured Prediction
- Conditional Random Fields
- Log-Linear Cleanup and Structure Learning
- CRF Cleanup and Beyond UGMs
- Structured Support Vector Machines

- Neural Network Review
- Deep Conditional Random Fields
- Convolutional Neural Networks
- More CNNs
- Fully-Convolutional Networks

- Bayesian Statistics
- Empirical Bayes
- Conjugate Priors
- Hierarchical Bayes
- Topic Models
- Rejection and Importance Sampling
- Metropolis-Hastings Algorithm
- Non-Parametric Bayes

- Active learning.
- Causality.
- Copulas.
- Grammars.
- Learning theory.
- Metric Learning.
- Online learning.
- Reinforcement learning.
- Relational models.
- Sub-modularity.
- Spectral methods.

Mark Schmidt > Courses