CPSC 340 - Machine Learning and Data Mining

Course contents

Introduction

Defining machine learning and data mining

Relation to other fields (stats, databases, probability, information theory)

Scalability

Privacy issues and social impact

Applications in AI, computer vision, computer games, search engines, marketing, bioinformatics, robotics, HCI and graphics.

Exploratory Data Analysis

Linear algebra revision (eigenvectors !!!)

Pagerank

The SVD, spectral methods and latent semantic indexing

Probabilistic component analysis

Examples: text mining, search engines, image compression and visualization

Graphical models

Introduction to discrete probability

Inference in Bayesian networks

Maximum likelihood and Bayesianlearning

Model selection

Supervised learning

Introduction to continuous probability

Linear regression and classification (least squares and ridge)

Model assessment and cross-validation

Introduction to optimization

Nonlinear regression (neural nets and Gaussian processes)

Boosting and feature selection

Examples

Unsupervised learning

Nearest neighbours and K-means

Spectral kernel methods for clustering and semi-supervised learning

The EM algorithm

Mixture models for discrete and continuous data

Temporal methods: hidden Markov models & Kalman filters

Boltzmann machines and random fields

Examples: web mining, collaborative filtering, music and image clustering, automatic translation, spam filtering, computer games and object recognition.

Other forms of learning

Semi-supervised learning

Active learning

Reinforcement learning

Self-taught learning

LATEST :

The machine learning book of Hastie, Tibshirani and Friedman is now online: The elements of statistical learning.
Chapters 14,15 and 20 of the artificial intelligence book Stuart Russell and Peter Norvig is strongly recommended reading for this course. I'll provide partial photocopies of chapters 14 and 15 in class. Chapter 20 is available online.
This AIspace page at UBC has lots of videos and applets about inference in directed probabilistic graphical models (aka Bayesian networks or belief networks).
For graphical models and Beta-Bernoulli models, I recommend A Tutorial on Learning with Bayesian Networks David Heckerman.
Kevin Murphy has compiled a nice page about Bayesian learning.
Wikipedia tutorial on the: SVD
The following handout should help you with linear algebra revision: PDF
The homework should be handed in on Wednesday at the beginning of the class. Please note that messy homeworks will be penalized - it is your responsibility to ensure that the material is presented in a clear written form. All pseudocode must be handed in. Please don't forget to add your name and student number.

USEFUL LINKS :

Machine learning video lectures
Why stats: NYTimes article
A cool company: Numenta
A video lecture about python's package matplotlib
The homepage of Andrew Ng