The notation is fairly consistent across the topics which makes it easier to see relationships, and the topics are meant to be gone through *in order* (with the difficulty slowly increasing and concepts being defined at their first occurrence).

The first set of notes is mainly from CPSC 340, an undergraduate-level course on machine learning and data mining. Related readings and assignments are available from the course homepage.

- Overview
- Data Exploration
- Decision Trees (Notes on Big-O Notation)
- Learning Theory
- Generative Models (Notes on Probability)
- Non-Parametric Models
- Ensemble Methods

- Linear Regression (Notes on Linear Algebra)
- Non-Linear Regression (Notes on Linear/Quadratic Gradients)
- Regularization
- Gradient Descent
- Logistic Regression (Notes on Feature Engineering)
- Support Vector Machines
- Kernel Methods
- Stochastic Gradient
- Feature Selection
- L1-Regularization
- Multi-Class Regression

- Principal Component Analysis
- More PCA
- Sparse Matrix Factorization
- Recommender Systems
- Multi-Dimensional Scaling

- Matrix Notation
- MAP Estimation
- Minimizing Maxes of Linear Functions
- Convex Functions (Notes on Norms, Notes on Max and Argmax)
- Gradient Descent Convergence Rate
- Gradient Descent for Logistic Regression
- Practical Issues and Newton-Like Methods (Notes on Implementing Gradient Descent)
- How hard is optimization?
- Weaker Assumptions for Linear Convergence
- L1-Regularization and Coordinate Optimization (Notes on Convexity Inequalities)
- Group Sparsity
- Projected Gradient
- Proximal Gradient
- Structured Sparsity
- Stochastic Subgradient
- Convergence Rate
- Practical Subgradient Methods
- Stochastic Average Gradient
- Kernel Methods
- Valid Kernels and Representer Theorem
- Fenchel Duality
- Large-Scale Kernel Methods

- Density Estimation
- Univariate Gaussian
- Multivariate Gaussian
- Mixture Models
- Learning with Hidden Values
- Expectation Maximization
- Monotonicity of EM
- Kernel Density Estimation
- Factor Analysis (Notes on EM)
- Independent Component Analysis
- Markov Chains
- Monte Carlo Methods
- Message Passing

- Directed Acyclic Graphical Models
- D-Separation
- D-Separation and Plate Notation
- Learning and Inference in DAGs
- Undirected Graphical Models
- Complexity of Inference in Graphical Models
- ICM and Gibbs Sampling
- Variational Inference
- Block Approximate Inference
- Hidden Markov Models
- Boltzmann Machines
- Log-Linear Models
- Structured Prediction
- Conditional Random Fields
- Log-Linear Cleanup and Structure Learning
- CRF Cleanup and Beyond UGMs
- Structured Support Vector Machines

- Neural Network Review
- Deep Conditional Random Fields
- Convolutional Neural Networks
- More CNNs
- Fully-Convolutional Networks

- Bayesian Statistics
- Empirical Bayes
- Conjugate Priors
- Hierarchical Bayes
- Topic Models
- Rejection and Importance Sampling
- Metropolis-Hastings Algorithm
- Non-Parametric Bayes

- Active learning.
- Causality.
- Copulas.
- Grammars.
- Learning theory.
- Metric Learning.
- Online learning.
- Reinforcement learning.
- Relational models.
- Sub-modularity.
- Spectral methods.

Mark Schmidt > Courses