CPSC 536N: Algorithms That Matter

“All algorithms equally matter, but some matter more than others.”
                                                       - George Orwell (paraphrased)

2016-17 Winter Term 2

General Information

Course Website:

Lecture Time: Mondays & Fridays, 12:00-1:30pm

Lecture Room: FSC 1003 

Instructor: Prof. Nicholas Harvey, X851,





Signup link



·       Assignment 0, Due Friday Jan 20th, in class.

·       Assignment 1, Due Friday March 3rd, in class.

·       Assignment 2, Due ≤(1+e)×(Monday April 3rd).








M 1/9

MapReduce Model



F 1/13




M 1/16

Connected Components, Min Spanning Tree

Karloff-Suri-Vassilvitskii, Lattanzi et al


F 1/20

Counting Triangles

Suri-Vassilvitskii, Pagh-Tsourakakis


M 1/23

Fast Greedy Algorithms in MapReduce



F 1/27

Submodular Maximization in MapReduce



M 1/30

Matroid basics & greedy algorithm


F 2/3

No class: UBC Theory Workshop



M 2/6

Submodular Maximization: Continuous Greedy



F 2/10

Submodular Maximization: Pipage Rounding


M 2/13

No class: Family Day



F 2/17

Non-monotone submodular optimization


M 2/20

No class: Reading Week


F 2/24

No class: Reading Week



M 2/27

CountSketch, Sparse Recovery



F 3/3




M 3/6

Graph Sketching



F 3/10

Locality sensitive hashing for Hamming Dist

Har-Peled, Indyk, Motwani


M 3/13

Locality sensitive hashing for L2



F 3/17

Online bipartite matching

Devanur-Jain-Kleinberg,  Mehta 6.3


M 3/20


Buchbinder-Jain-Naor,  Mehta 6.1


F 3/24

Erasure codes, Fountain codes

Mitzenmacher, McKay


M 3/27

LT codes, Raptor codes

McKay, Luby, Shokrollahi-Luby


F 3/31

Locally decodable codes

Oggier-Datta, Wellenzohn, Yekhanin



Less theoretical books

·       Lin, Dyer. "Data-Intensive Text Processing with MapReduce"

·       Rajaraman, Leskovec, Ullman. Mining of Massive Datasets


Probability Background

·       Nick’s lecture notes on Chernoff bounds


Big Data Models

·       Valiant "A Bridging Model for Parallel Computation" CACM 1990

·       Campbell, "A Survey of Models of Parallel Computation"  TR 1997

·       Karloff, Suri, Vassilvitskii: "A Model of Computation for MapReduce." SODA 2010


Big Data Algorithms

·       O'Malley "TeraByte Sort on Apache Hadoop". TR 2008

·       Kumar, Moseley, Vassilvitskii, Vattani "Fast Greedy Algorithms in MapReduce and Streaming" SPAA13, TOPC16


Big Data Graph Processing

·       Cohen. "Graph Twiddling in a MapReduce World". Report 2009

·       Lattanzi, Moseley, Suri, Vassilvitskii "Filtering: A Method for solving graph problems in MapReduce" SPAA11

·       S Suri, S Vassilvitskii "Counting triangles and the curse of the last reducer" WWW'11

·       Pagh, Tsourakakis. "Colorful triangle counting and a MapReduce implementation". IPL 12

·       Kumar, Moseley, Vassilvitskii, Vattani "Fast Greedy Algorithms in MapReduce and Streaming" SPAA13, TOPC16


Counting Triangles

·       Pagh, Tsourakakis. "Colorful triangle counting and a MapReduce implementation". IPL 12


Matroid & Submodular Function Basics

·       Jan Vondrak’s lecture notes. Lectures 7, 8, 9

·       Michel Goemans’ lecture notes

·       Chandra Chekuri’s 2010 lecture notes. Lectures 14, 15, 16, 20

·       Lex Schrijver’s lecture notes. Chapter 10

·       McCormick “Submodular Function Minimization”


Submodular Maximization

·       Krause, Golovin. “Submodular Function Maximization.” 2014

o   This is an overview with no proofs, but includes some discussion of applications.

·       Dughmi. “Submodular function: extensions, distributions and algorithms: a survey” 2009

·       Jan Vondrak’s lecture notes. Lectures 16, 17, 19

·       Nemhauser, Wolsey, Fischer. “An analysis of approximations for maximizing submodular set functions – I”

·       Calinescu, Chekuri, Pal, Vondrak. “Maximizing a Monotone Submodular Function subject to a Matroid Constraint”. STOC 2008, SICOMP 2011

·       Feige, Mirrokni, Vondrak. “Maximizing non-monotone submodular functions.” FOCS 2007, SICOMP 2011

·       Buchbinder, Feldman, Naor, Schwartz. Submodular maximization with cardinality constraints. SODA 2014



·       Count Sketch

o   Chandra Chekuri’s 2014 lecture notes. Lectures 6, 7.

o   Jelani Nelson’s lecture notes. Lecture 7.

o   McGregor, Muthukurishnan. Textbook draft. Section 3.2

o   Gilbert, Indyk. "Sparse recovery using sparse matrices". Section II.B.

o   Charikar, Chen, Farach-Colton. "Finding frequent items in data streams". ICALP 2002.

§  The original paper, but analysis is not as clean as the later expositions.

·       L0-sampling

o   Chandra Chekuri’s 2014 lecture notes. Lectures 9.

o   Jelani Nelson’s lecture notes. Lecture 7.

o   Jowhari, Saglam, Tardos. "Tight bounds for Lp samplers". PODS 2011

·       Graph Connectivity


Locality Sensitive Hashing

·       Har-Peled, Indyk, Motwani. "Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality". STOC98, FOCS01, ToC 2012

o   The journal version is significantly easier to read than the conference versions!

·       Datar, Immorlica, Indyk, Mirrokni. "Locality-sensitive hashing scheme based on p-stable distributions" SoCG 2004


Online Matching

·       Online Bipartite Matching (KVV)

o   Mehta. "Online Matching and Ad Allocation". FNTTCS 2013

§  Primal-dual analysis of Online Bipartite Matching in in Section 6.3

o   Devanur, Jain, Kleinberg. "Randomized Primal-Dual Analysis of RANKING for Online Bipartite Matching". SODA 2013

§  The source for Mehta Section 6.3

o   Birnbaum, Mathieu. “On-line Bipartite Matching Made Simple”. SIGACT News 2008

§  A simplification of KVV

o   Karp, Vazirani, Vazirani. “An Optimal Algorithm for On-line Bipartite Matching”. STOC 1990

§  The original paper. Quite hard to read, and hard to convince yourself of correctness.

·       AdWords (MSVV)

o   Mehta. "Online Matching and Ad Allocation". FNTTCS 2013

§  Primal-dual analysis of AdWords problem is in Section 6.1

o   Buchbinder, Jain, Naor. "Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue".

§  The source for Mehta Section 6.1



·       Erasure codes

o   McKay. “Information theory, inference, and learning algorithms”. Chapter 50

o   Mitzenmacher. “Digital fountains: a survey and look forward”. ITW 2004

o   Shokrollahi, Luby. “Raptor codes” FNTTCS 2011

o   Luby. “LT codes” FOCS 2002

o   Maymounkov. “Online codes”.

·       Codes for distributed storage

o   Wellenzohn "Erasure coding in Distributed Storage Systems". TR 2015

o   Oggier, Datta. “Coding Techniques for Repairability in Networked Distributed Storage Systems”. FNTCIT 2012

o   Yekhanin “Locally decodable codes” FNTTCS 2012

o   Huang et al “Erasure coding in Windows Azure storage” USENIX ATC 2012


Upcoming topics that we might cover

Streaming Graphs

·        Charikar, O'Callaghan, Panigrahy. "Better streaming algorithms for clustering problems". STOC03

·        Feigenbaum, Kannan, McGregor, Suri, and Zhang. On graph problems in a semi-streaming model. TCS 2005

·        Goel, Kapralov, Khanna. On the communication and streaming complexity of maximum bipartite matching, SODA 2012

·        Ahn, Guha, McGregor. "Graph Sketches: Sparsifiers, Spanners and Subgraphs". PODS 2012

·        Dynamic Graph Connectivity

§  Kapron, King, Mountjoy. "Dynamic Connectivity in Polylogarithmic worst case time" SODA 2013

§  Wang. "An Improved Randomized Data Structure for Dynamic Graph Connectivity". Arxiv

·        Kapralov. Better bounds for matchings in the streaming model, SODA 2013

·        Kapralov, Khanna, Sudan. Approximating matching size from random streams, SODA 2014

·        Kapralov, Lee, Musco, Musco, Sidford. Single Pass Spectral Sparsification in Dynamic Streams, FOCS 2014

·        Crouch, Stubbs. "Improved Streaming Algorithms for Weighted Matching, via Unweighted Matching". APPROX14

·        Assadi, Khanna, Li and Yaroslavtsev.  "Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model". SODA 2016



Good readings for projects:
Papers that we didn’t, or probably won’t cover

Big Data Models


Big Data Algorithms


Triangle-Counting Algorithms

·       Bar-Yossef, Kumar, Sivakumar "Reductions in streaming algorithms, with an application to counting triangles in graphs". SODA02

·       Eden, Levi, Ron, Seshadhri. "Approximately Counting Triangles in Sublinear Time". FOCS15

·       McGregor, Vorotnikova, Vu. "Better algorithms for counting triangles in data streams" PODS16


Big Data Submodular


Submodular Maximization

·       Filmus, Ward. “A tight combinatorial algorithm for submodular maximization subject to a matroid constraint.” FOCS 2012

·       Buchbinder, Feldman, Naor, Schwartz. “A tight linear-time ½-approximation for unconstrained submodular maximization.” FOCS 2012

·       Ashwinkumar, Vondrak. “Fast algorithms for maximizing submodular functions.” SODA 2014

·       Buchbinder, Feldman. “Deterministic algorithms for submodular maximization problems.” SODA 2016


Streaming Algorithms

·        Kane, Nelson, Woodruff. “An optimal algorithm for the distinct elements problem”. PODS 10

·        Andoni, Krauthgamer, Onak. “Streaming algorithms via precision sampling”, FOCS 2011

·        Agarwal, Cormode, Huang, Phillips, Wei, Yi. Mergeable Summaries. PODS12, TODS13

·        Li, Nguyen, Woodruff. Turnstile Streaming Algorithms Might As Well Be Linear Sketches. STOC 14

·        Minton, Price. Improved Concentration Bounds for Count-Sketch. SODA 2014

·        Braverman, Chestnut, Ivkin, Woodruff. "Beating CountSketch for Heavy Hitters in Insertion Streams". STOC16

·        Chakraborty, Goldenberg, Koucky. Streaming Algorithms for embedding and computing edit distance in the low distance regime. FOCS 16

·        Bellazougui, Zhang, Edit Distance: Sketching, Streaming and Document Exchange. FOCS 16

·        Mossel, Kannan, Yaroslavtsev. Linear Sketching over F2. Unpublished

·        Braverman, Chestnut, Ivkin, Nelson, Wang, Woodruff. “BPTree: an L2 heavy hitters algorithm using constant memory”. PODS 2017


Graph Streaming

·       McGregor, Tench, Vorotnikova, Vu. “Densest Subgraph in Dynamic Graph Streams”. MFCS 2015



Potential Topics

·       Distributed Models

o   Algorithms for Map Reduce

o   Graphs in MapReduce model

·       Big Data Algorithms

o   MinHash, SimHash, LSH

o   Streaming Algorithms: Count Sketch, HyperLogLog, …

o   Linear Algebra: Subspace Embeddings, Fast Regression, …

·       Machine Learning

o   Stochastic Gradient Descent

o   Online Learning

o   Boosting

o   Matrix Completion

·       Online Advertising

·       Systems

o   Codes in Cloud Storage

o   Cloud Scheduling


Guidance on Topics

I thank the following people for their ideas regarding the topics.

·       Google: Vahab Mirrokni, Eric Lehman, Petar Maymounkov, Ravi Kumar, David Lowe, Nando de Freitas.

·       Microsoft: Mohit Singh.

·       UBC: Ivan Beschastnikh, Mark Schmidt.