CPSC 531H: Machine Learning Theory

General Information

Course Website: http://www.cs.ubc.ca/~nickhar/W14

Lecture Time: Tuesdays & Thursdays, 11:00am-12:30am (January-April 2014)

Lecture Room: ICICS 206

Instructor: Prof. Nicholas Harvey, X851, nickhar@cs.ubc.ca
TA: Bobak Shahriari, bshahr@cs.ubc.ca.

This is a graduate course on some theoretical aspects of machine learning. The emphasis is on foundations and on results with rigorous proofs. The viewpoint is much more computational than statistical.

Syllabus

Assignments

· Assignment 1, Due Thursday January 30th.

· Revised version of Problem 3 on Assignment 1.

· Assignment 2, Due Thursday February 27th.

· Assignment 3, Due Thursday March 20th.

· Assignment 4, Due Tuesday April 8^th or Thursday April 10th.

Projects

Proposal Due: Thursday February 27th, in class.

Oral presentations: Monday April 14^th in ICCS 104.

Written report: due on same day as oral presentations.

Lectures

	Date	Topics	Readings
1	Tu 1/7	Intro and start of PAC learning	Mohri et al Ch 1, 2.1; Schapire 1; Balcan 1
2	Th 1/9	Sample complexity with finite hypothesis sets	Mohri et al 2.1, 2.2; Schapire 2, 3, 4; Balcan 2
3	Tu 1/14	Sample complexity and the growth function	Schapire 4, 5; Balcan 7, 9; Blum 5, 6; Pitassi 6; Abu-Mostafa 5, 6
4	Th 1/16	VC-dimension	Mohri et al 3.3; Schapire 6; Balcan 8; Blum 5, 6; Pitassi 6; Abu-Mostafa 7
5	Tu 1/21	Sample complexity in the inconsistent case	Mohri et al 2.3, 2.4; Schapire 7; Balcan 7, 8
6	Th 1/23	Boosting by filtering	Kearns-Vazirani Ch 4; Blum 7
7	Tu 1/28	Boosting by filtering proof; Start of AdaBoost	Mohri et al Ch 6; Schapire 10; Ihler video
8	Th 1/30	AdaBoost proof, Margins	Mohri et al Ch 6; Schapire 10, 11
9	Tu 2/4	Perceptron, Margin-Perceptron	Mohri et al 7.3.1; Balcan 3; Blum 4; Bartlett 2
10	Th 2/6	Perceptron Generalization Error and Kernels	Mohri et al 7.3.1, 5.1 and 5.2.1; Bartlett 3; Balcan 15; Blum 4
11	Tu 2/11	SVM algorithm	Mohri et al Ch 4, App B; Ng 3
12	Th 2/13	Kernelized SVM algorithm	Mohri et al Ch 4, App B; Ng 3
13	Tu 2/25	Johnson-Lindenstrauss dimensionality reduction	Mohri et al 12.4; My lecture notes; Bouchard-Côté slides
14	Tu 3/4	Subspace embeddings, linear regression	Nelson 14
15	Th 3/6	Fast Johnson-Lindenstrauss transform	Nelson 12; Ailon-Chazelle Survey
16	Tu 3/11	Online learning model, Halving algorithm	Mohri et al 7.1-7.2; SS Ch 1; Kleinberg 1.1-1.3; Bartlett 21.1; Balcan 24
17	Th 3/13	Deterministic & randomized weighted majority	Mohri et al 7.2; Kleinberg 1.4-1.5; Bartlett 21.2; Szepesvari 4; Balcan 24
18	Tu 3/18	Online convex optimization, Follow the leader	SS 2-2.2; Szepesvari 5
20	Tu 3/25	Multi-armed bandits	Kleinberg 8; Briest 6
21	Th 3/27	Spectral partitioning, Spectral graph theory	Singh 21; von Luxborg Tutorial; Lau 1
22	Tu 4/1	Eigenvalues of graphs, conductance	Lau 2; Spielman 2
23	Th 4/3	Cheeger's Inequality	Lau 2; Spielman 6
24	Tu 4/8	Multi-way Spectral Partitioning	Singh 21; von Luxborg Tutorial; Lau 4

Further reading

The PAC Model

· Valiant “A Theory of the Learnable”

Hardness of Proper PAC learning k-term DNFs

· Pitassi’s Machine Learning Theory, Lecture 12

Hardness of Learning Halfspaces

· Bshouty, Burroughs. Maximizing agreements and coagnostic learning.

· Feldman et al. On agnostic learning of parities, monomials and halfspaces.

· Guruswami, Raghavendra. Hardness of learning halfspaces with noise.

Boosting

· Boosting by Filtering: Schapire. The Strength of Weak Learnability.

· Hedge and AdaBoost: Freund, Schapire. A decision-theoretic generalization of on-line learning and an application to Boosting. EuroCOLT ’95 and JCSS ’97.

Margin-Perceptron

· Balcan, Blum. On a Theory of Learning with Similarity Functions.

Convex Optimization

· Stephen Boyd’s video and slides on Lagrangians.

Faster Least Squares

· Drineas, Mahoney, Muthukrishnan, Sarlos. Faster Least Squares Approximation.

· Mahoney. Randomized Algorithms for Matrices and Data.

Online Learning

· Deterministic & Randomized Weighted Majority: Littlestone, Warmuth. The Weighted Majority Algorithm. FOCS ’89, Inf and Comp ‘94.

· Hedge and AdaBoost: Freund, Schapire. A decision-theoretic generalization of on-line learning and an application to Boosting. EuroCOLT ’95 and JCSS ’97.

· Optimal bounds: Cesa-Bianchi, Freund, Haussler, Helmbold, Schapire, Warmuth. How to Use Expert Advice. STOC ’93, JACM ‘97.

Bandits

· Wikipedia. The multi-armed bandit problem.

· Kleinberg. Online Decision Problems with Large Strategy Sets. PhD Thesis, ‘05.

· Auer, Cesa-Bianchi, Freund, Schapire. The non-stochastic multi-armed bandit problem. FOCS ’95, SICOMP ’02.

· Bubeck, Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. FnTML ’12.

Spectral Clustering

· Charles Martin’s blog post. Spectral Clustering: A Quick Overview.

· SciKit-Learn’s Spectral clustering package, source code.

· Shi, Malik. Normalized Cuts and Image Segmentation. CVPR ‘97, IEEE Trans PAMI ‘00.

· Ng, Jordan, Weiss. On Spectral Clustering: Analysis and an Algorithm. NIPS ‘01.

· Lee, Oveis Gharan, Trevisan. Multi-way spectral partitioning and higher-order Cheeger inequalities. STOC ’12, JACM ‘14.

Similar classes at other universities

· Robert Schapire’s “Theoretical Machine Learning” at Princeton

· Nina Balcan’s “Machine Learning Theory” at Georgia Tech

· Avrim Blum’s “Machine Learning Theory” at Carnegie Mellon

· Peter Bartlett’s “Statistical Learning Theory” at Berkeley

· Toni Pitassi’s “Machine Learning Theory” at Toronto

· Sham Kakade and Ambuj Tewari’s “Learning Theory” at U Chicago

Other moderately related courses

· Csaba Szepesvari’s “Online Learning” at U Alberta

· Sanjeev Arora’s “Is Learning Easy?” at Princeton

· Ankur Moitra’s “Algorithmic Aspects of Machine Learning” at MIT

UBC classes on machine learning

· CPSC 340 “Machine Learning and Data Mining” Fall 2013

· CPSC 532H “Machine Learning and Optimisation for Automated Algorithm Design” Fall 2013

· CPSC 540 “Machine Learning” Winter 2013

· STAT 447B “Methods for Statistical Learning” Fall 2013