General Information
Course Website: http://www.cs.ubc.ca/~nickhar/W14
Lecture Time: Tuesdays
& Thursdays, 11:00am-12:30am (January-April 2014)
Lecture Room: ICICS 206
Instructor: Prof. Nicholas
Harvey, X851, nickhar@cs.ubc.ca
TA: Bobak Shahriari, bshahr@cs.ubc.ca.
This
is a graduate course on some theoretical aspects of machine learning. The
emphasis is on foundations and on results with rigorous proofs. The viewpoint
is much more computational than statistical.
Assignments
·
Assignment 1, Due Thursday
January 30th.
·
Revised version of Problem 3 on
Assignment 1.
·
Assignment 2, Due Thursday
February 27th.
·
Assignment 3, Due Thursday
March 20th.
·
Assignment 4, Due Tuesday April 8th
or Thursday April 10th.
Proposal
Due: Thursday February 27th, in
class.
Oral presentations: Monday April 14th in ICCS 104.
Written report: due on
same day as oral presentations.
Lectures
|
Date |
Topics |
Readings |
1 |
Tu 1/7 |
Intro and
start of PAC learning |
|
2 |
Th 1/9 |
Sample
complexity with finite hypothesis sets |
|
3 |
Tu 1/14 |
Sample
complexity and the growth function |
Schapire 4,
5;
Balcan 7, 9; Blum 5, 6; Pitassi 6;
Abu-Mostafa 5,
6 |
4 |
Th 1/16 |
VC-dimension |
Mohri et
al 3.3; Schapire 6;
Balcan 8;
Blum 5, 6; Pitassi 6;
Abu-Mostafa 7 |
5 |
Tu 1/21 |
Sample
complexity in the inconsistent case |
|
6 |
Th 1/23 |
Boosting
by filtering |
Kearns-Vazirani
Ch 4; Blum 7 |
7 |
Tu 1/28 |
Boosting
by filtering proof; Start of AdaBoost |
|
8 |
Th 1/30 |
AdaBoost
proof, Margins |
|
9 |
Tu 2/4 |
Perceptron,
Margin-Perceptron |
|
10 |
Th 2/6 |
Perceptron
Generalization Error and Kernels |
Mohri et
al 7.3.1, 5.1 and 5.2.1; Bartlett 3;
Balcan 15;
Blum 4 |
11 |
Tu 2/11 |
SVM algorithm |
Mohri et
al Ch 4, App B; Ng 3 |
12 |
Th 2/13 |
Kernelized
SVM algorithm |
Mohri et
al Ch 4, App B; Ng 3 |
13 |
Tu 2/25 |
Johnson-Lindenstrauss
dimensionality reduction |
Mohri et
al 12.4; My
lecture notes; Bouchard-Côté slides |
14 |
Tu 3/4 |
Subspace
embeddings, linear regression |
Nelson 14 |
15 |
Th 3/6 |
Fast
Johnson-Lindenstrauss transform |
Nelson 12; Ailon-Chazelle
Survey |
16 |
Tu 3/11 |
Online
learning model, Halving algorithm |
Mohri et al
7.1-7.2; SS Ch
1; Kleinberg 1.1-1.3;
Bartlett 21.1;
Balcan 24 |
17 |
Th 3/13 |
Deterministic
& randomized weighted majority |
Mohri et
al 7.2; Kleinberg 1.4-1.5;
Bartlett 21.2;
Szepesvari 4;
Balcan 24 |
18 |
Tu 3/18 |
Online
convex optimization, Follow the leader |
|
20 |
Tu 3/25 |
Multi-armed
bandits |
|
21 |
Th 3/27 |
Spectral
partitioning, Spectral graph theory |
Singh 21;
von
Luxborg Tutorial; Lau 1 |
22 |
Tu 4/1 |
Eigenvalues
of graphs, conductance |
|
23 |
Th 4/3 |
Cheeger's
Inequality |
|
24 |
Tu 4/8 |
Multi-way
Spectral Partitioning |
Singh 21;
von
Luxborg Tutorial; Lau 4 |
Further reading
The PAC Model
·
Valiant “A Theory of the Learnable”
Hardness
of Proper PAC learning k-term DNFs
·
Pitassi’s Machine
Learning Theory, Lecture 12
Hardness
of Learning Halfspaces
·
Bshouty, Burroughs. Maximizing
agreements and coagnostic learning.
·
Feldman et al. On
agnostic learning of parities, monomials and halfspaces.
·
Guruswami, Raghavendra. Hardness
of learning halfspaces with noise.
Boosting
·
Boosting
by Filtering: Schapire. The
Strength of Weak Learnability.
·
Hedge
and AdaBoost: Freund, Schapire. A decision-theoretic
generalization of on-line learning and an application to Boosting. EuroCOLT
’95 and JCSS ’97.
Margin-Perceptron
·
Balcan, Blum. On a Theory
of Learning with Similarity Functions.
Convex
Optimization
·
Stephen Boyd’s video and slides on
Lagrangians.
Faster
Least Squares
·
Drineas, Mahoney, Muthukrishnan,
Sarlos. Faster Least Squares
Approximation.
·
Mahoney. Randomized Algorithms for Matrices and
Data.
Online
Learning
·
Deterministic
& Randomized Weighted Majority: Littlestone, Warmuth. The Weighted Majority
Algorithm. FOCS ’89, Inf and Comp ‘94.
·
Hedge
and AdaBoost: Freund, Schapire. A decision-theoretic
generalization of on-line learning and an application to Boosting. EuroCOLT
’95 and JCSS ’97.
·
Optimal
bounds: Cesa-Bianchi, Freund, Haussler, Helmbold, Schapire,
Warmuth. How
to Use Expert Advice. STOC ’93, JACM ‘97.
Bandits
·
Wikipedia. The multi-armed bandit
problem.
·
Kleinberg. Online Decision
Problems with Large Strategy Sets. PhD Thesis, ‘05.
·
Auer, Cesa-Bianchi, Freund,
Schapire. The
non-stochastic multi-armed bandit problem. FOCS ’95, SICOMP ’02.
·
Bubeck, Cesa-Bianchi. Regret Analysis of
Stochastic and Nonstochastic Multi-armed Bandit Problems. FnTML ’12.
Spectral
Clustering
·
Charles Martin’s blog post. Spectral
Clustering: A Quick Overview.
·
SciKit-Learn’s Spectral
clustering package, source
code.
·
Shi, Malik. Normalized Cuts and
Image Segmentation. CVPR ‘97, IEEE Trans PAMI ‘00.
·
Ng, Jordan, Weiss. On Spectral
Clustering: Analysis and an Algorithm. NIPS ‘01.
·
Lee,
Oveis Gharan, Trevisan. Multi-way
spectral partitioning and higher-order Cheeger inequalities. STOC ’12, JACM
‘14.
Similar classes at other universities
·
Robert Schapire’s “Theoretical
Machine Learning” at Princeton
·
Nina Balcan’s “Machine Learning Theory” at
Georgia Tech
·
Avrim Blum’s “Machine Learning Theory”
at Carnegie Mellon
·
Peter Bartlett’s “Statistical
Learning Theory” at Berkeley
·
Toni Pitassi’s “Machine
Learning Theory” at Toronto
·
Sham Kakade and Ambuj Tewari’s “Learning Theory” at
U Chicago
Other moderately related courses
·
Csaba Szepesvari’s “Online Learning” at U
Alberta
·
Sanjeev Arora’s “Is
Learning Easy?” at Princeton
·
Ankur Moitra’s “Algorithmic Aspects of
Machine Learning” at MIT
UBC classes on machine learning
·
CPSC 340 “Machine Learning and Data
Mining” Fall 2013
·
CPSC 532H “Machine
Learning and Optimisation for Automated Algorithm Design” Fall 2013