Distributed Systems

Distributed Systems

538B, Winter 2019, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tue/Thu 930-11AM, MacLeod 220, UBC course page

Course piazza

Office hours by appointment


Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems that ensnare our computing devices.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Besides paper readings the first half of the term will include hands-on assignments. These will get you to apply the concepts from the readings to build small but practical distributed system prototypes of your very own. In the second half of the term the assignment will be replaced with an open-ended project.

In this course we will be exclusively using the Go programming language for all assignments and the project. Go was designed with distributed systems in mind and you will come to appreciate this. Learning a new programming language is also an important skill and this course presents an opportunity to practice this skill. Note that Go will not be explicitly covered by the course. I assume that students can pick up the necessary language features as needed.

This course satisfies the PhD "Computer Systems and Design" breadth requirement.

Course-level learning goals

The course will provide an opportunity for participants to
  • understand key principles involved in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • become familiar with important techniques for solving problems that arise in distributed contexts
  • develop advanced distributed systems using the Go programming language

Format

This seminar-style course will be primarily driven through in-class discussion of the readings. The readings are available online and are linked to from the schedule below (a work in progress). Most of the readings are research papers; there is no textbook. The course will also include an open-ended course project (the bulk of your course mark) that must be completed as part of a team.

  • The discussion in each class will focus on 1-2 papers
  • Everyone must post a response for each of the assigned papers to Piazza, due at least 24 hours before class

If you are in the course, you should start with the following tasks:

  • Complete the Go tutorial and start learning Go.
  • Sign up for the class Piazza instance. We will use this for all class communication.

Schedule (a work in progress)

Jan 3
Thu
Introductions, course overview, and academic paper reading
  1. How to Read a Paper, Srinivasan Keshav.
  2. Writing reviews for systems conferences, Timothy Roscoe.
Do not write responses for these readings.
Jan 8
Tue
Clocks and ordering of events
  1. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. CACM 1978. (read up to Anomalous Behavior section).
  2. Baldoni and Raynal. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 2002.
A1:Swati
S1:Hassan

A2:Chris
S2:Puneet
Jan 10
Thu
Distributed snapshots
  1. Chandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System. TOCS 1985.
A1:Matthew
S1:Christian
Jan 15
Tue
Distributed state
  1. Ousterhout. The Role of Distributed State. TR 1990.
A1: Felix
S1: Swati
Jan 17
Thu
Replicated state machines
  1. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. CSUR 1990.
A1: X
S1: X
Jan 22
Tue
Distributed shared memory 1/2

(Ivan out of town; Nodir and Fabian lead)

  1. Li. IVY: A Shared Virtual Memory System for Parallel Computing. ICPP 1988.
  2. Feeley et al. Implementing global memory management in a workstation cluster. SOSP 1995.
A1:
S1:

A2:
S2:
Jan 24
Thu
Distributed shared memory 2/2

(Ivan out of town; Nodir and Fabian lead)

  1. Keleher et al. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. WTEC 1994.
A1:
S1:
Jan 29
Tue
Consensus theory
  1. Impossibility of Distributed Consensus with One Faulty Process (FLP result). JACM 1985
  2. Ben-Or. Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols. PODC 1983.
A1:
S1:

A2:
S2:
Jan 31
Thu
Paxos
  1. Lamport. Paxos Made Simple. 2001
A1:
S1:
Feb 5
Tue
Coordination services
  1. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006.
  2. Hunt et al. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX ATC 2010.
A1:
S1:

A2:
S2:
Feb 7
Thu
Raft and viewstamped replication
  1. Ongaro and Ousterhout. In Search of an Understandable Consensus Algorithm. USENIX 2014.
  2. Liskov. Viewstamped Replication Revisited. Springer-Verlag 2010
A1:
S1:

A2:
S2:
Feb 12
Tue

No class (Snow day)

Feb 14
Tue
Programming frameworks
  1. Liskov. Distributed Programming in Argus. CACM 1988.
  2. Jul et al. Fine-grained mobility in the Emerald system. TOCS 1988.
A1:
S1:

Feb 19
Tue

No class (Reading week)

Feb 21
Tue

No class (Reading week)

Assignment 2 due Friday, Feb 22, via email.
Feb 26
Tue
Languages and model checking
  1. Killian et al. Mace: Language Support for Building Distributed Systems. PLDI 2007.
  2. Yang et al. MODIST: Transparent Model Checking of Unmodified Distributed Systems. NSDI 2009.

(Ivan out of town; Renato and Matthew lead)

A1:
S1:

A2:
S2:

Feb 28
Tue
Verification
  1. Hawblitzel et al. IronFleet: Proving Practical Distributed Systems Correct. SOSP 2015.

(Ivan out of town; Renato and Matthew lead)

A1:
S1:

Mar 5
Tue
Debugging and tracing
  1. Aguilera et al. Performance debugging for distributed systems of black boxes. SOSP 2003.
  2. Sigelman et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. TR 2010.
A1:
S1:

A2:
S2:
Mar 7
Thu
Pub-Sub primitive
  1. Eugster et al. The Many Faces of Publish/Subscribe. CSUR 2003.
A1:
S1:
Mar 12
Tue
Distributed Hash Tables
  1. Stoica et al. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. TON 2003.
  2. Dabek et al. Wide-area cooperative storage with CFS. SOSP 2001.
Project proposals due March 12 11:59PM
A1:
S1:

A2:
S2:
Mar 14
Thu
CAP theorem
  1. Gilbert and Lynch. Perspectives on the CAP Theorem. Computer J. 2012.
A1:
S1:
Mar 19
Tue
Weak consistency: optimistic replication
  1. Saito and Shapiro. Optimistic replication. CSUR 2005.
A1:
S1:
Mar 21
Thu
Weak consistency: CRDTs
  1. Shapiro et al. Conflict-free Replicated Data Types. TR 2011.
A1:
S1:
Mar 26
Tue
MapReduce and Spark
  1. Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.
  2. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012.
A1:
S1:

A2:
S2:
Mar 28
Thu
TensorFlow
  1. Abadi et al. TensorFlow: A System for Large-Scale Machine Learning. OSDI 2016.
A1:
S1:
Apr 2
Tue
P2P: BitTorrent and Tor
  1. Cohen. Incentives build robustness in BitTorrent
  2. Dingledine et al. Tor: The Second-Generation Onion Router
A1:
S1:

A2:
S2:
Apr 4
Thu
P2P: BitCoin

(Last day of classes)

  1. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System
  2. Eyal et al. Bitcoin-NG: A Scalable Blockchain Protocol. NSDI 2016.
A1:
S1:

A2:
S2:

Week of Apr 8th

Project presentations during a day this week

Apr 19

Final project reports and code due 11:59PM

Paper responses

For each of the assigned readings in the schedule above you must compose a 1-2 paragraph response. (The schedule sometimes lists optional readings; you do not need to respond to these). You should post your response on Piazza at least 24 hours before class in the thread with the title/date for the paper.

See paper responses instructions for more information.

You will have access to all the other students' submissions. Please read them, because reading the other responses is a good way for you to get perspective. You can see what you missed (or what other people missed), and whether you agree with their take on the key ideas. It will help to make the class sessions more productive.

The response will be graded using the following scale:

  • 1 : the response engages with the reading in some depth
  • 0 : no response was posted, or it was posted late, or the response is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Late responses will not be accepted.

Assignments

Assignment 1: Replicated key-value store

Assignment 2: Formal specification of a replicated key-value store

UBC GitHub submission instructions

We will use an enterprise version of GitHub at UBC for all assignment/project code and writeup submissions.

To hand in your code you will need to precisely follow a couple of steps. First, log into github.ugrad.cs. Notice that you are part of the CPSC538B-2018W-T2 org. This is the org under which you will create all of your repositories in this course. All students and staff in the course have accounts under this org.

For repository/submission of a course deliverable you must create a new private repository under the above org. For A1 the repository must have the name A1-[USERID] where you should replace [USERID] with your cs undergraduate login. For A2 (assignment 2), etc you will use a similar format. Here is a picture of what your screen should look like when you create your repo and the different formats you should use for different course deliverables. Use these formats exactly as shown:



Work inside your repo as you would usually. We will mark the commit that immediately precedes the deadline time.

Project

The project must address a non-trivial problem relevant to distributed systems. The project can resolve the problem by building a system, by collecting data/carrying out experiments, by developing algorithms and proving them correct, etc. I strongly prefer that you do your project in a team of 2-3 people, but this is not a strict requirement.

Here are some projects ideas (do not limit yourself to these!):

  • Build a fault-tolerant parallel computing platform.
  • Build a peer-to-peer version of DropBox.
  • Build a record-replay tool for distributed Go programs.
  • Build a distributed state assertions library for Go.
  • Build an anonymity system based on onion routing.
  • Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.

I especially encourage project proposals that span other areas of computer science. Meet and chat with me if you have an idea, or if you have a topic but need to translate it into a project idea.

There are four project deliverables:

  • Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information.
  • Prototype implementation/experiments: must involve substantial development/experimental effort. The final prototype/data must be shared with the instructor (at the end of the term), preferably as an hg or git repository.
  • Project presentation: a 30 minute presentation describing your project. I encourage you to demo your project (if you built a system) during your talk. Presentation details TBD.
  • Project report: a paper detailing the problem, your approach/solution, your prototype/experiments, and analysis/evaluation. The final report (due at the end of the term) should be no longer than 10 pages and should resemble a research paper. See final report instructions for more information.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

Timeline of project deliverables:

  • Before March 12 : Project proposal drafts shared with Ivan as google docs
  • March 12 : Project proposals
  • Week of April 8 : Project presentations (about 30 minutes per team)
  • April 19 : Project reports and code

Grading

Final course mark will be based off of:

  • Class participation: 10%
  • Paper responses: 20%
  • Assignment(s): 10%
  • Project: 60%
    • Proposal: 20% (team)
    • Presentation: 5% (team) + 5% (individual)
    • Final report and code: 30% (team)

Note that the project must be a team effort. The team's mark for the proposal and final report is the same for all team members. For project presentations each team member will receive a team mark and an individual mark.

The mark for class participation is based on three factors:

  • Regular course attendance
  • Regular participation in the in-class paper discussions
  • Leading of discussion for advocate/skeptic assignments

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful response, and then read and carefully consider the responses submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and responseing abilities.

Invest the time into learning Go. As with any new programming language, your proficiency in the language will highly correlate with the amount of code you write: practice counts for a lot!

Plan you reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Do not underestimate the importance of a thorough (and interesting!) project proposal. Proposal write-ups that are vague or are incomplete will not be accepted. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Reach out for success and be proactive. There are no explicit office hours for this course. Email and schedule a time to meet with the instructor to discuss the course, the project, etc.

University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with your instructor or an academic advisor. Learn about how you can plan for success at: www.students.ubc.ca. For help addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit: https://students.ubc.ca/health-wellness

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper responses. Paper responses must be written individually. You are free to discuss the readings with other students, but write your responses on your own. Cite and attribute points from discussions with other students or external sources that you have read in your response.

Assignments. You can talk with other students about the assignments. You cannot look at each other's code and you cannot share code. You can use Google and other resources to find code examples, but you should not rely on these directly by copy and pasting: you must implement your own solution from scratch. You can use standard Go libraries (as long as they don't directly solve the assignment for you); you cannot use any external libraries that are not distributed with Go.

If you do discuss the assignment with someone or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your assignment (or in body of email with your solution). Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your assignment solution.

Projects. The policy for project is more liberal than for assignments: you are free to use any code you find in your project (e.g., you can use external libraries that are not distributed with Go). However, a non-trivial fraction of functionality in your prototype must be constructed by your team. As with assignments, cite and attribute sources of the code that you borrow/utilize in your project.