CPSC 531H: Machine Learning Theory

General Information

Course Website: http://www.cs.ubc.ca/~nickhar/W14

Lecture Time: Tuesdays & Thursdays, 11:00am-12:30am (January-April 2014)

Lecture Room: ICICS 206

Instructor: Prof. Nicholas Harvey, X851, nickhar@cs.ubc.ca
TA: Bobak Shahriari, bshahr@cs.ubc.ca.

 

This is a graduate course on some theoretical aspects of machine learning. The emphasis is on foundations and on results with rigorous proofs. The viewpoint is much more computational than statistical.

 

Syllabus

 

Assignments

·         Assignment 1, Due Thursday January 30th.

·         Revised version of Problem 3 on Assignment 1.

·         Assignment 2, Due Thursday February 27th.

·         Assignment 3, Due Thursday March 20th.

·         Assignment 4, Due Tuesday April 8th or Thursday April 10th.

 

Projects

Proposal Due: Thursday February 27th, in class.

Oral presentations: Monday April 14th in ICCS 104.

Written report: due on same day as oral presentations.

 

Lectures

 

Date

Topics

Readings

1

Tu 1/7

Intro and start of PAC learning

Mohri et al Ch 1, 2.1; Schapire 1; Balcan 1

2

Th 1/9

Sample complexity with finite hypothesis sets

Mohri et al 2.1, 2.2; Schapire 2, 3, 4; Balcan 2

3

Tu 1/14

Sample complexity and the growth function

Schapire 4, 5; Balcan 7, 9; Blum 5, 6; Pitassi 6; Abu-Mostafa 5, 6

4

Th 1/16

VC-dimension

Mohri et al 3.3; Schapire 6; Balcan 8; Blum 5, 6; Pitassi 6; Abu-Mostafa 7

5

Tu 1/21

Sample complexity in the inconsistent case

Mohri et al 2.3, 2.4; Schapire 7; Balcan 7, 8

6

Th 1/23

Boosting by filtering

Kearns-Vazirani Ch 4; Blum 7

7

Tu 1/28

Boosting by filtering proof; Start of AdaBoost

Mohri et al Ch 6; Schapire 10; Ihler video

8

Th 1/30

AdaBoost proof, Margins

Mohri et al Ch 6; Schapire 10, 11

9

Tu 2/4

Perceptron, Margin-Perceptron

Mohri et al 7.3.1; Balcan 3; Blum 4; Bartlett 2

10

Th 2/6

Perceptron Generalization Error and Kernels

Mohri et al 7.3.1, 5.1 and 5.2.1; Bartlett 3; Balcan 15; Blum 4

11

Tu 2/11

SVM algorithm

Mohri et al Ch 4, App B; Ng 3

12

Th 2/13

Kernelized SVM algorithm

Mohri et al Ch 4, App B; Ng 3

13

Tu 2/25

Johnson-Lindenstrauss dimensionality reduction

Mohri et al 12.4; My lecture notes; Bouchard-Côté slides

14

Tu 3/4

Subspace embeddings, linear regression

Nelson 14

15

Th 3/6

Fast Johnson-Lindenstrauss transform

Nelson 12; Ailon-Chazelle Survey

16

Tu 3/11

Online learning model, Halving algorithm

Mohri et al 7.1-7.2; SS Ch 1; Kleinberg 1.1-1.3; Bartlett 21.1; Balcan 24

17

Th 3/13

Deterministic & randomized weighted majority

Mohri et al 7.2; Kleinberg 1.4-1.5; Bartlett 21.2; Szepesvari 4; Balcan 24

18

Tu 3/18

Online convex optimization, Follow the leader

SS 2-2.2; Szepesvari 5

20

Tu 3/25

Multi-armed bandits

Kleinberg 8; Briest 6

21

Th 3/27

Spectral partitioning, Spectral graph theory

Singh 21; von Luxborg Tutorial; Lau 1

22

Tu 4/1

Eigenvalues of graphs, conductance

Lau 2; Spielman 2

23

Th 4/3

Cheeger's Inequality

Lau 2; Spielman 6

24

Tu 4/8

Multi-way Spectral Partitioning

Singh 21; von Luxborg Tutorial; Lau 4

 

Further reading

The PAC Model

·         Valiant “A Theory of the Learnable

 

Hardness of Proper PAC learning k-term DNFs

·         Pitassi’s Machine Learning Theory, Lecture 12

 

Hardness of Learning Halfspaces

·         Bshouty, Burroughs. Maximizing agreements and coagnostic learning.

·         Feldman et al. On agnostic learning of parities, monomials and halfspaces.

·         Guruswami, Raghavendra. Hardness of learning halfspaces with noise.

 

Boosting

·         Boosting by Filtering: Schapire. The Strength of Weak Learnability.

·         Hedge and AdaBoost: Freund, Schapire. A decision-theoretic generalization of on-line learning and an application to Boosting. EuroCOLT ’95 and JCSS ’97.

 

Margin-Perceptron

·         Balcan, Blum. On a Theory of Learning with Similarity Functions.

 

Convex Optimization

·         Stephen Boyd’s video and slides on Lagrangians.

 

Faster Least Squares

·         Drineas, Mahoney, Muthukrishnan, Sarlos. Faster Least Squares Approximation.

·         Mahoney. Randomized Algorithms for Matrices and Data.

 

Online Learning

·         Deterministic & Randomized Weighted Majority: Littlestone, Warmuth. The Weighted Majority Algorithm. FOCS ’89, Inf and Comp ‘94.

·         Hedge and AdaBoost: Freund, Schapire. A decision-theoretic generalization of on-line learning and an application to Boosting. EuroCOLT ’95 and JCSS ’97.

·         Optimal bounds: Cesa-Bianchi, Freund, Haussler, Helmbold, Schapire, Warmuth. How to Use Expert Advice. STOC ’93, JACM ‘97.

 

Bandits

·         Wikipedia. The multi-armed bandit problem.

·         Kleinberg. Online Decision Problems with Large Strategy Sets. PhD Thesis, ‘05.

·         Auer, Cesa-Bianchi, Freund, Schapire. The non-stochastic multi-armed bandit problem. FOCS ’95, SICOMP ’02.

·         Bubeck, Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. FnTML ’12.

 

Spectral Clustering

·         Charles Martin’s blog post. Spectral Clustering: A Quick Overview.

·         SciKit-Learn’s Spectral clustering package, source code.

·         Shi, Malik. Normalized Cuts and Image Segmentation. CVPR ‘97, IEEE Trans PAMI ‘00.

·         Ng, Jordan, Weiss. On Spectral Clustering: Analysis and an Algorithm. NIPS ‘01.

·         Lee, Oveis Gharan, Trevisan. Multi-way spectral partitioning and higher-order Cheeger inequalities. STOC ’12, JACM ‘14.

 

Similar classes at other universities

·         Robert Schapire’s “Theoretical Machine Learning” at Princeton

·         Nina Balcan’s “Machine Learning Theory” at Georgia Tech

·         Avrim Blum’s “Machine Learning Theory” at Carnegie Mellon

·         Peter Bartlett’s “Statistical Learning Theory” at Berkeley

·         Toni Pitassi’s “Machine Learning Theory” at Toronto

·         Sham Kakade and Ambuj Tewari’s “Learning Theory” at U Chicago

 

Other moderately related courses

·         Csaba Szepesvari’s “Online Learning” at U Alberta

·         Sanjeev Arora’s “Is Learning Easy?” at Princeton

·         Ankur Moitra’s “Algorithmic Aspects of Machine Learning” at MIT

 

UBC classes on machine learning

·         CPSC 340 “Machine Learning and Data Mining” Fall 2013

·         CPSC 532H “Machine Learning and Optimisation for Automated Algorithm Design” Fall 2013

·         CPSC 540 “Machine Learning” Winter 2013

·         STAT 447B “Methods for Statistical Learning” Fall 2013