“All algorithms equally matter,
but some matter more than others.”
- George Orwell (paraphrased)
2016-17
Winter Term 2
General Information
Course Website: http://www.cs.ubc.ca/~nickhar/W17
Lecture Time: Mondays &
Fridays, 12:00-1:30pm
Lecture Room: FSC 1003
Instructor: Prof. Nicholas
Harvey, X851, nickhar@cs.ubc.ca
Piazza
Assignments
· Assignment 0, Due Friday Jan 20th, in class.
· Assignment 1, Due Friday March 3rd, in class.
· Assignment 2, Due ≤(1+e)×(Monday
April 3rd).
Schedule
|
Date |
Topics |
Readings |
1 |
M 1/9 |
MapReduce Model |
Karloff-Suri-Vassilvitskii |
2 |
F 1/13 |
TeraSort |
O’Malley |
3 |
M 1/16 |
Connected
Components, Min Spanning Tree |
Karloff-Suri-Vassilvitskii, Lattanzi et al |
4 |
F 1/20 |
Counting
Triangles |
Suri-Vassilvitskii, Pagh-Tsourakakis |
5 |
M 1/23 |
Fast
Greedy Algorithms in MapReduce |
Kumar-Moseley-Vassilvitskii-Vattani |
6 |
F 1/27 |
Submodular
Maximization in MapReduce |
Kumar-Moseley-Vassilvitskii-Vattani |
7 |
M 1/30 |
Matroid basics & greedy algorithm |
|
F 2/3 |
No class:
UBC Theory Workshop |
|
|
8 |
M 2/6 |
Submodular
Maximization: Continuous Greedy |
Calinescu-Chekuri-Pal-Vondrak |
10 |
F 2/10 |
Submodular
Maximization: Pipage Rounding |
Calinescu-Chekuri-Pal-Vondrak |
M 2/13 |
No class:
Family Day |
|
|
11 |
F 2/17 |
Non-monotone
submodular optimization |
Feige-Mirrokni-Vondrak, |
M 2/20 |
No class:
Reading Week |
|
|
F 2/24 |
No class:
Reading Week |
|
|
13 |
M 2/27 |
CountSketch, Sparse Recovery |
|
14 |
F 3/3 |
L0-sampling |
Jowhari-Saglam-Tardos |
15 |
M 3/6 |
Graph
Sketching |
Ahn-Guha-McGregor |
16 |
F 3/10 |
Locality
sensitive hashing for Hamming Dist |
Har-Peled, Indyk, Motwani |
17 |
M 3/13 |
Locality
sensitive hashing for L2 |
Datar-Immorlica-Indyk-Mirrokni |
18 |
F 3/17 |
Online
bipartite matching |
Devanur-Jain-Kleinberg, Mehta 6.3 |
19 |
M 3/20 |
AdWords |
Buchbinder-Jain-Naor, Mehta 6.1 |
20 |
F 3/24 |
Erasure
codes, Fountain codes |
Mitzenmacher, McKay |
21 |
M 3/27 |
LT codes,
Raptor codes |
McKay, Luby, Shokrollahi-Luby |
22 |
F 3/31 |
Locally
decodable codes |
Oggier-Datta, Wellenzohn, Yekhanin |
Readings
Less theoretical books
·
Lin,
Dyer. "Data-Intensive Text Processing with MapReduce"
·
Rajaraman,
Leskovec, Ullman. Mining of Massive Datasets
Probability Background
·
Nick’s lecture notes
on Chernoff bounds
Big Data Models
·
Valiant "A
Bridging Model for Parallel Computation" CACM 1990
·
Campbell,
"A Survey of Models of Parallel Computation" TR 1997
·
Karloff, Suri, Vassilvitskii: "A Model of Computation for
MapReduce." SODA 2010
Big Data Algorithms
· O'Malley "TeraByte Sort on Apache Hadoop". TR 2008
Big Data Graph Processing
· Cohen.
"Graph Twiddling in a MapReduce World". Report 2009
· S Suri, S
Vassilvitskii "Counting triangles and the curse
of the last reducer" WWW'11
· Pagh, Tsourakakis. "Colorful triangle counting and a
MapReduce implementation". IPL 12
Counting Triangles
· Pagh, Tsourakakis. "Colorful triangle counting and a
MapReduce implementation". IPL 12
Matroid &
Submodular Function Basics
·
Jan Vondrak’s lecture notes. Lectures 7, 8, 9
·
Michel Goemans’ lecture notes
·
Chandra Chekuri’s 2010 lecture notes. Lectures 14, 15, 16, 20
·
Lex Schrijver’s
lecture notes. Chapter 10
· McCormick
“Submodular Function Minimization”
Submodular Maximization
· Krause, Golovin. “Submodular Function Maximization.” 2014
o
This is an overview with no proofs, but includes some discussion
of applications.
·
Dughmi. “Submodular function: extensions, distributions and
algorithms: a survey” 2009
·
Jan Vondrak’s lecture notes. Lectures 16, 17, 19
· Feige, Mirrokni, Vondrak. “Maximizing non-monotone submodular functions.”
FOCS 2007, SICOMP 2011
· Buchbinder, Feldman, Naor,
Schwartz. Submodular maximization with cardinality constraints. SODA 2014
Sketching
· Count Sketch
o
Chandra Chekuri’s 2014 lecture notes. Lectures 6, 7.
o
Jelani
Nelson’s lecture notes. Lecture 7.
o
McGregor,
Muthukurishnan. Textbook draft. Section 3.2
o
Gilbert,
Indyk. "Sparse recovery using sparse
matrices". Section II.B.
o
Charikar, Chen, Farach-Colton.
"Finding frequent items in data streams". ICALP 2002.
§ The
original paper, but analysis is not as clean as the later expositions.
· L0-sampling
o
Chandra Chekuri’s 2014 lecture notes. Lectures 9.
o
Jelani
Nelson’s lecture notes. Lecture 7.
o
Jowhari,
Saglam, Tardos. "Tight
bounds for Lp samplers". PODS 2011
· Graph
Connectivity
Locality Sensitive Hashing
o The journal version is significantly easier to read than
the conference versions!
Online Matching
·
Online Bipartite Matching (KVV)
o Mehta.
"Online Matching and Ad Allocation". FNTTCS 2013
§ Primal-dual analysis of Online Bipartite Matching in in
Section 6.3
§ The source for Mehta Section 6.3
o Birnbaum, Mathieu.
“On-line Bipartite Matching Made Simple”. SIGACT News 2008
§ A simplification of KVV
o Karp, Vazirani, Vazirani. “An Optimal
Algorithm for On-line Bipartite Matching”. STOC 1990
§ The original paper. Quite hard to read, and hard to
convince yourself of correctness.
·
AdWords (MSVV)
o Mehta.
"Online Matching and Ad Allocation". FNTTCS 2013
§ Primal-dual analysis of AdWords problem is in Section 6.1
o Buchbinder, Jain, Naor.
"Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue".
§ The source for Mehta Section 6.1
Codes
·
Erasure codes
o McKay.
“Information theory, inference, and learning algorithms”. Chapter 50
o Mitzenmacher. “Digital fountains: a survey and look
forward”. ITW 2004
o Shokrollahi, Luby. “Raptor codes”
FNTTCS 2011
·
Codes for distributed storage
o Wellenzohn "Erasure coding in Distributed Storage
Systems". TR 2015
o Yekhanin “Locally decodable codes” FNTTCS 2012
o Huang
et al “Erasure coding in Windows Azure storage” USENIX ATC 2012
Upcoming topics that we might cover
Streaming Graphs
·
Charikar,
O'Callaghan, Panigrahy. "Better streaming
algorithms for clustering problems". STOC03
·
Feigenbaum, Kannan,
McGregor, Suri, and Zhang. On graph problems in a semi-streaming model. TCS
2005
·
Ahn, Guha, McGregor. "Graph Sketches: Sparsifiers,
Spanners and Subgraphs". PODS 2012
·
Dynamic
Graph Connectivity
§ Kapron, King, Mountjoy.
"Dynamic Connectivity in Polylogarithmic worst
case time" SODA 2013
§ Wang. "An
Improved Randomized Data Structure for Dynamic Graph Connectivity". Arxiv
·
Kapralov. Better bounds for matchings in the streaming model, SODA
2013
·
Kapralov,
Khanna, Sudan. Approximating matching size from random streams, SODA 2014
Good readings for projects:
Papers that we didn’t, or probably won’t cover
Big Data Models
Big
Data Algorithms
Triangle-Counting Algorithms
·
Eden, Levi, Ron, Seshadhri.
"Approximately Counting Triangles in Sublinear Time". FOCS15
·
McGregor, Vorotnikova, Vu. "Better algorithms for counting
triangles in data streams" PODS16
Big Data Submodular
Submodular Maximization
·
Ashwinkumar, Vondrak. “Fast
algorithms for maximizing submodular functions.” SODA 2014
·
Buchbinder,
Feldman. “Deterministic algorithms for submodular maximization problems.” SODA
2016
Streaming
Algorithms
·
Kane, Nelson,
Woodruff. “An optimal algorithm for the distinct elements problem”. PODS
10
·
Andoni, Krauthgamer, Onak. “Streaming algorithms via precision sampling”, FOCS
2011
·
Agarwal, Cormode, Huang, Phillips, Wei, Yi. Mergeable
Summaries. PODS12, TODS13
·
Li, Nguyen, Woodruff. Turnstile Streaming Algorithms
Might As Well Be Linear Sketches. STOC 14
·
Minton, Price. Improved
Concentration Bounds for Count-Sketch. SODA 2014
·
Bellazougui,
Zhang, Edit Distance: Sketching, Streaming and Document Exchange. FOCS 16
·
Mossel,
Kannan, Yaroslavtsev. Linear Sketching over F2.
Unpublished
Graph
Streaming
· McGregor, Tench,
Vorotnikova, Vu. “Densest Subgraph in Dynamic Graph Streams”.
MFCS 2015
Potential Topics
·
Distributed Models
o
Algorithms for Map Reduce
o
Graphs in MapReduce model
·
Big Data Algorithms
o
MinHash,
SimHash, LSH
o
Streaming Algorithms: Count Sketch,
HyperLogLog, …
o
Linear Algebra: Subspace Embeddings, Fast Regression, …
·
Machine Learning
o
Stochastic Gradient Descent
o
Online Learning
o
Boosting
o
Matrix Completion
·
Online Advertising
·
Systems
o
Codes in Cloud Storage
o
Cloud Scheduling
Guidance on Topics
I thank the following people for
their ideas regarding the topics.
·
Google:
Vahab Mirrokni, Eric
Lehman, Petar Maymounkov,
Ravi Kumar, David Lowe, Nando de Freitas.
·
Microsoft:
Mohit
Singh.
·
UBC:
Ivan Beschastnikh, Mark Schmidt.