Data at Scale

CPSC 538W
September 2013
Andrew Warfield

Course Overview

Analyses on large volumes of data are being used to improve the effectiveness of advertising, fight cancer, and solve crimes. Many organizations are scrambling to take better advantage of an emerging set of tools for large-scale data processing, and there is a huge emerging market around providing both applications and infrastructure for working with data at scale.

In this course, we will study the systems challenges that surround large-scale data collection, analysis, transformation, and access.

The premise of our discussion will be that, while there is a growing set of off-the-shelf tools available to build large-scale information systems, these tools often achieve scale and (some amount of) ease of use at the expense of efficiency. As an example of this property, Yahoo!'s record in this year's Gray Sort Benchmark manages to sort an astonishing 1.42TB of data per minute. It achieves this using 2.1 thousand servers, each with 12 spinning disks. The underlying system is achieving a disk bandwidth of less than 1% of what the hardware should be capable of.

These software systems are deeply layered, distributed, and complex. Further, the hardware that is deployed in the datacenter has undergone very significant changes over the past ten years, and does not necessarily match the assumptions with which many existing systems are designed. We will start and end the course by examining some example large-scale analytics platforms and we will spend the body of the course exploring aspects of low-level system design in order to gain a better understanding of the challenges involved in computing at very large scales.

Goals

The aim of this course is to make you think about large complex software systems from top to bottom, and to take a whole-system approach to understanding performance and efficiency at scale.

  1. Gain an understanding of the design of some large scale data analytics, storage, and related information systems and understand the thinking that guided their design.
  2. Understand the state of the art in terms of datacenter hardware, and in particular the current state of networking and storage hardware, and issues over power and heat as computing becomes more dense.
  3. Learn to evaluate performance and efficiency in large systems, and to reason about how effectively a system is making use of the physical resources that are available to it.

The course will be very discussion focused. You should expect to be participating in discussion during every single lecture.

Schedule

Mon, Sept 9

Bootstrap.


How to Read a Paper Srinivasan Keshav
Writing reviews for systems conferences Timothy Roscoe
Wed, Sept 11

Distributed Compute


MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. (OSDI'04)
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. (NDSI'12)
Mon, Sept 16 Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. (EuroSys'07)
TritonSort: A Balanced Large-Scale Sorting System. Alexander Rasmussen, George Porter, Michael Conley, Harsha V. Madhyasthay Radhika Niranjan Mysore, Alexander Pucher, Amin Vahdat. (NSDI'11)
Mihir Nanavati covering for Andy
Wed, Sept 18 All university classes cancelled.
Mon, Sept 23

Storage at Scale


FAWN: A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan. (SOSP'12)
Fast crash recovery in RAMCloud. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. (SOSP'11)
Wed, Sept 25 Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. Brad Calder et al. (SOSP'11)
Finding a needle in Haystack: Facebook’s photo storage. Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, Peter Vajgel. (OSDI'10)
Mon, Sept 30 The Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. (SOSP'03)
Spanner: Google's Globally-Distributed Database. James C Corbett et al. (OSDI'12)
Wed, Oct 2

Ghosts of Storage Past


Petal: Distributed virtual disks. Edward K. Lee and Chandramohan A. Thekkath. (ASPLOS'96)
Frangipani: a scalable distributed file system. Chandramohan A. Thekkath, Timothy Mann, Edward K. Lee. (SOSP'97)
Mon, Oct 7 The design and implementation of a log-structured file system. Mendel Rosenblum and John K. Ousterhout.
Overview of the Spiralog file system.James E. Johnson and William A. Laing.
Wed, Oct 9

Solid State


CORFU: A Shared Log Design for Flash Clusters. Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Michael Wei, John D. Davis. (NSDI'12)
Tango: Distributed Data Structures over a Shared Log. Mahesh Balakrishnan et al. (SOSP'13)
Mon, Oct 14 Thanksgiving.
Wed, Oct 16 Gecko: Contention-Oblivious Disk Arrays for Cloud Storage. Ji Yong Shin, Mahesh Balakrishnan,Tudor Marian, Hakim Weatherspoon
Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems. Matias Bjørling, Jens Axboe, David Nellans, Philippe Bonnet. (SYSTOR'13)
Mon, Oct 21

Datacenter Networks


PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat. (SIGCOMM'09)
Integrating Microsecond Circuit Switching into the Data Center. George Porter Richard Strong Nathan Farrington Alex Forencich Pang Chen-Sun Tajana Rosing Yeshaiahu Fainman George Papen Amin Vahdaty. (SIGCOMM'13)
Jake Wires covering for Andy.
Wed, Oct 23 ServerSwitch: A Programmable and High Performance Platform for Data Center Networks. Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou, Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, and Yongguang Zhang. (NSDI'11)
Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann∗ , Jon Howell, and Yutaka Suzue (OSDI'12)
Mon, Oct 28 IOFlow: A Software-Defined Storage Architecture Eno Thereska, Hitesh Ballani, Greg O'Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey , Richard Black, Timothy Zhu
Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems. Amar Phanishayee, Elie Krevat, Vijay Vasudevan, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Srinivasan Seshan. (FAST'08).
Wed, Oct 30

Consistency in the Small


Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem Marshall Kirk McKusick, Gregory R. Ganger (USENIX ATC'99)
Generalized File System Dependencies Christopher Frost, Mike Mammarella, Eddie Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, and Lei Zhang (SOSP 2007)
Mon, Nov 4 Recon: Verifying File System Consistency at Runtime Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown (FAST'12)
Optimistic Crash Consistency Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. (SOSP'13)
Wed, Nov 6

Consistency in the Large (or not)

When reading these papers, you should also be considering hte use of Chained replication in FAWN and the Google Spanner paper that we read earlier in the course.
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels (SOSP'07)
Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen (SOSP'11)
Mon, Nov 11 Rememberance day.
Wed, Nov 13 Slack space for the moment, and to be filled depending on what area people seem to have the most interest in. Failing that we'll probably do Ceph and Ursa minor on this day.
Mon, Nov 18

Distributed Compute Revisited

Naiad: A Timely Dataflow System. Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, Martin Abadi. (SOSP'13)
Discretized Streams: Fault-Tolerant Streaming Computation at Scale. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica. (SOSP'13)
Wed, Nov 20 Rhea: Automatic Filtering for Unstructured Cloud Storage. Rhea: Automatic Filtering for Unstructured Cloud Storage. (NSDI'13)
Robustness in the Salus Scalable Block Store. Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin.(NSDI'13)
Mon, Nov 25 Demos
Wed, Nov 27 Demos

Operational

538W is a seminar-style course that also happens to qualify as a systems breadth requirement for UBC graduate students. This means that, accordingly, you will need to do two things to do well in the course: (1) Participate in seminar discussions, and build an interesting system. If you are unable to contribute constructively to a classroom discussion with 10-20 of your peers, or you are uncomfortable spending quite a bit of time over the next four months building and presenting a nontrivial software project, this probably isn't the course for you! If you desire to hedge against the commitment of a large systems implementation project, you may write a brief (200 words or less), but incredibly good and readable article that articulates the problems faced in terms of dealing with large amounts of data within some specific domain. This may involve a combination of summarizing relevant publications and interviewing domain experts.

Marks are allocated as follows:

  • 60% for readings, reviews, in-class discussion and presentations.
  • 40% for building, writing up, and presenting your course project as a demo for the class. This can either be marked as all 40% for your presentation write-up and demo, or as a 20/20 split if you elect to do a smaller project but also take the essay option.

Reviews: You will be responsible for submitting conference program committee-style reviews into a local instance of hotcrp by 8pm, the evening before papers are discussed in class. Once submitted, you will be able to read any reviews that have been submitted by your peers.

Project write-ups are to be brief, five-page descriptions of the problem that your system is trying to solve, the approach to implementation, a set of results in applying or evaluating the system, and a summary of relevant related work.

HOTCRP: please create yourself an account on the hotcrp server, which is at http://hotcrp.nss.cs.ubc.ca/538W/. This is the system where you are responsible for entering reviews on the papers that we discuss in class.