Distributed Systems Abstractions

538B, Fall 2021, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tue/Thu 3:30-5PM PST, in person, UBC course page

Office hours: by appointment

Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Unlike standard courses on distributed systems, this course will focus on abstractions. Dijkstra has a wonderful quote that gets to the heart of abstractions in computer science:

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

Powerful abstractions are a common theme in distributed systems (and really all software and hardware computing systems). This course will focus on a select number of abstractions that have persisted over time and continue to influence modern system designs. For example, some abstractions capture notions of coordination (consensus), others consider fault tolerance (replicated state machines) and consistency semantics (eventual consistency). These abstractions are worth studying because each one provides an accessible entry into the design of what are generally highly complex artifacts.

Course-level learning goals

By the end of this course participants will be able to

understand key principles involved in designing and implementing distributed systems
reason about problems that involve distributed components
use abstractions to solve problems that arise in distributed contexts
prototype advanced distributed systems

In-person format

The discussion in each class will focus on 1-2 papers
Everyone must post a response for each of the assigned papers to Piazza, due at least 18 hours before class

Schedule (a work in progress)

Sep 9 Thu	Introductions, course overview, and academic paper reading How to Read a Paper, Srinivasan Keshav. Writing reviews for systems conferences, Timothy Roscoe. Do not write responses for these readings. Other introductory readings: Google. Introduction to Distributed System Design.
Sep 14 Tue	Clocks and ordering of events Baldoni and Raynal. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 2002.	A1: Yanze S1: Ben
Sep 16 Thu	Distributed snapshots Chandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System. TOCS 1985.	A1: Zack S1: Alex
Sep 21 Tue	Distributed state Ousterhout. The Role of Distributed State. TR 1990.	A1: Narun S1: Tarcisio
Sep 23 Thu	Replicated state machines Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. CSUR 1990. Project speed dating in class.	A1: Aryan S1: Ray
Sep 28 Tue	Consensus: Paxos Lamport. Paxos Made Simple. 2001 Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006. Related: Paxos Made Practical Paxos Made Live - An Engineering Perspective (More Chubby experiences) Hunt et al. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX ATC 2010. Tolerating latency in replicated state machines through client speculation, NSDI 2009.	A1: Narun S1: --- A2: Preeti S2: Minglong
Sep 30 Thu	*National Day for Truth and Reconciliation: UBC closed*
Oct 5 Tue	Byzantine Fault Tolerance Castro and Liskov. Practical Byzantine Fault Tolerance Related: Lamport, Shostak, and Pease. The Byzantine Generals Problem Project proposal drafts due October 5th at 18:00 PST	A1: Shayan S1: Eric
Oct 7 Thu	Distributed shared memory Feeley et al. Implementing global memory management in a workstation cluster Related: TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems GMS project publications	A1: Niloo S1: Ray
Oct 12 Tue	Programming frameworks: Argus Liskov. Distributed Programming in Argus. CACM 1988.	A1: Braxton S1: Minglong
Oct 14 Thu	Pub-Sub primitive Eugster et al. The Many Faces of Publish/Subscribe Finalized project proposals due October 15th at 18:00 PST	A1: Alex S1: Sadaf
Oct 19 Tue	Language support Holt et al. Disciplined Inconsistency with Consistency Types. SoCC 2016. Related: Killian et al. Mace: Language Support for Building Distributed Systems. PLDI 2007.	A1: Eric S1: Joe
Oct 21 Thu	Verification Wilcox et al. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. PLDI 2015. Related: Hawblitzel et al. IronFleet: Proving Practical Distributed Systems Correct. SOSP 2015.	A1: Yanze S1: Joe
Oct 26 Tue	Model checking Yang et al. MODIST: Transparent Model Checking of Unmodified Distributed Systems. NSDI 2009. Related: Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. NSDI 2007.	A1: Tarcisio S1: Preeti
Oct 28 Thu	Tracing Sigelman et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. TR 2010. First project milestone due October 29th at 18:00 PST	A1: Braxton S1: Nalin
Nov 2 Tue	Dataflow: Spark Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012. Related: Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004. Scalability! But at what COST?	A1: Niloo S1: Milad
Nov 4 Thu	Dataflow: TensorFlow Abadi et al. TensorFlow: A System for Large-Scale Machine Learning. OSDI 2016. Related: Scaling distributed machine learning with the Parameter Server. OSDI 2014.	A1: Haley S1: Shadab
Nov 9 Tue	CAP theorem Gilbert and Lynch. Perspectives on the CAP Theorem. Computer J. 2012. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed Related Corbett et al. Spanner: Google's Globally-Distributed Database. OSDI 2012.	A1: Nalin S1: Shayan A2: Shizuko S2: Sadaf
Nov 11 Thu	*Fall midterm break* Second project milestone due November 15th at 18:00 PST Email to schedule meetings with Ivan due November 15th at 18:00 PST
Nov 16 Tue	Weak consistency: eventual consistency and CRDTs Vogels. Eventually consistent Shapiro et al. Conflict-free Replicated Data Types. TR 2011. Related: Video: Strong Eventual Consistency and Conflict-free Replicated Data Types Readings in conflict-free replicated data types Demers et al. Epidemic algorithms for replicated database maintenance Eventual Consistency Today: Limitations, Extensions, and Beyond	A1: Sepand S1: Ghazal A2: Aryan S2: Sepand
Nov 18 Thu	Distributed Hash Tables Stoica et al. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. TON 2003. Related: Kademlia: A P2P Information System Based on the XOR Metric Using Lightweight Modeling To Understand Chord	A1: Milad S1: Ghazal
Nov 23 Tue	P2P: BitTorrent and BitCoin Cohen. Incentives build robustness in BitTorrent Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System Nielsen. How the Bitcoin protocol actually works (2-column pdf version) Related: Levin et al. BitTorrent is an Auction: Analyzing and Improving BitTorrent's Incentives Dwork and Naor. Pricing via Processing or Combatting Junk Mail	A1: Shizuko S1: Haotian A2: Mishaal S2: Haotian
Nov 25 Thu	P2P: Ethereum Buterin. A Next-Generation Smart Contract and Decentralized Application Platform. 2013. Third project milestone due November 29th at 18:00 PST	A1: --- S1: Zack
Nov 30 Tue	Privacy: Tor Dingledine et al. Tor: The Second-Generation Onion Router Christin. Traveling the silk road: a measurement analysis of a large anonymous online marketplace.	A1: Shadab S1: Ben A2: Haley S2: Mishaal
Dec 2 Tue	In-class project presentations
Dec 7 Tue	In-class project presentations Final project reports due December 21st at 18:00 PST

Paper responses

For each of the assigned readings in the schedule above you must compose a 1-2 paragraph response. (The schedule sometimes lists optional readings; you do not need to respond to these). You should post your response on Piazza at least 18 hours before class in the thread with the title/date for the paper. See paper responses instructions for more information.

Everyone will have access to all the other students' response submissions. Please read them before class. Reading the other responses is a good way for you to gain perspective. You can see what you missed (or what other people missed), and whether you agree with their take on the key ideas. It will help to make the class sessions more productive.

The response will be graded using the following scale:

1 : the response engages with the reading in some depth
0 : no response was posted, or it was posted late, or the response is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Everyone must sign up to be an advocate/skeptic for a reading at least once during the term (likely 2-3 times depending on the final number of students in the class). If you are an advocate or a skeptic for the assigned reading, then you do not have to submit a response. See information about advocate and skeptic roles.

The advocate/skeptic roles will be graded using the following scale:

1 : the advocate/skeptic helped to promote in-class discussion and illustrated in-depth understanding of the reading.
0 : advocate/skeptic did not attend class, did not seem to read the assigned reading, or was entirely un-engaged during in-class discussion.

Project

The project must address a non-trivial problem relevant to distributed systems. The project can resolve the problem by building a system, by collecting data/carrying out experiments, by developing algorithms and proving them correct, etc. I strongly prefer that you do your project in a team of 2-4 people.

Here are some projects ideas (do not limit yourself to these!):

Build a fault-tolerant parallel computing platform.
Build a peer-to-peer version of DropBox.
Build a record-replay tool for distributed Go programs.
Build a distributed state assertions library for Go.
Build an anonymity system based on onion routing.
Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.
Formalize an existing implementation/proposal of a non-trivial routing, replication, fault tolerance, leader election or other distributed algorithm and prove that it is correct (or incorrect) in Coq.
Create a robust protocol fuzzing tool and apply it to popular distributed system prototypes.
Create a visualization tool that explains executions of distributed systems. Evaluate the tool on several existing systems.

I especially encourage project proposals that span other areas of computer science. Schedule a time to chat with me if you have an idea, or if you have a topic but need to translate it into a project idea.

The required project deliverables are listed below. In cases where the deliverable is a written paper, I would prefer that you share the doc with Ivan as an editable google doc. If you would rather use another submission approach, let me know. I would prefer final project reports in pdf format for an ACM SIG of your choice.

Project proposal draft: a paper outlining as much of the project proposal as possible. See proposal advice for more information on what is expected in the proposal draft. The draft does not need to have well-defined milestones.
Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information. The proposal must detail three well-defined milestones. Each milestone must include (1) deliverables that you will share with me for the milestone, (2) a written document that explains the deliverables and their status. The best way to think of the milestones is as a contract: if I accept your proposal and you meet the milestone you describe, then you will receive the full mark for the milestone.
1st project milestone: 1st milestone deliverables and a document that explains the deliverables and their status. For example, if the deliverable is code, then the document must describe what the code does, under what conditions it works, how to run and deploy the code, etc. If the deliverable is a dataset, then the document must explain the format of the data, what it means, how it was collected, how complete it is, etc.
2nd project milestone: 2nd milestone deliverables and a document that explains the deliverables and their status.
3rd project milestone: 3rd milestone deliverables and a document that explains the deliverables and their status.
Project update meeting: Your project group will meet with me to discuss your project status. We will discuss the first two milestones, your progress towards the third milestones, and any outstanding questions/concerns that you might have.
Project presentation: a presentation describing your project. I encourage you to demo your project during your talk. We will have presentations across two days. Each presentation will be followed by a QA period. Presentation timing details TBD.
All groups should provide every team member with an equal amount of time to speak during the presentation slot. I don't care who presents what in the talk, but (1) everyone must speak about the project, and (2) everyone must speak for an equal amount of time.

During the QA, I hope to hear from everyone on the team, as well. But, in some cases (e.g., technical questions directed at a specific part of the project) a specific person may be the only one who can properly answer a question, and that's fine.

You can compose your slides/presentation in whatever format you wish. You can even present a demo if you can. An example outline of a presentation that I would expect would look something like this (think technical talk):
- 1. Motivation: what did you work on this topic/high-level view of the project topic (esp. in the context of this course)
- 2. Background: anything that the audience needs to know to understand your talk
- 3. Requirements/goals: what were you intending to build
- 4. Design: what was the design of what you've build
- 5. Implementation: any interesting implementation details
- 6. Evaluation methodology: what questions were you intending to answer in the eval/what was the structure of your eval
- 7. Evaluation results: results from your evaluation
Prototype implementation/experiments: must involve substantial development/experimental effort. The final prototype/data must be shared with Ivan preferably as a git repository at the same time as the project report.
Project report: a paper detailing the problem, your approach/solution, your prototype/experiments, and analysis/evaluation. The final report (due at the end of the term) should be no longer than 10 pages (excluding references) and should resemble a research paper. See final report instructions for more information.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

Timeline of project deliverables:

October 5: Project proposal drafts
October 15: Finalized project proposals
October 29: 1st project milestone
November 15: 2nd project milestone + email to schedule meeting with Ivan
November 29: 3rd project milestone
December 2,7: In-class project presentations
December 21 : Project reports and code

Grading

Final course mark will be based off of class participation, paper responses, and project deliverables.

Class/piazza participation: 10%
Paper responses: 30%
Project: 60%

Draft proposal: 5%
Final proposal: 10%
1st milestone: 7%
2nd milestone: 7%
3rd milestone: 7%
Presentation: 4%
Final report and code: 20%

All members of the project team will receive the same mark for all of the project components. If you are experiencing difficulties with your team, please approach me as early as possible so that I can help.

The proposal, presentation, and final report/code are all required components. You cannot pass the course without having each of these. Specifically, you will receive a 0 for the project if any of these are missing.

The mark for class/piazza participation (10%) is based on four factors:

Regular course attendance and any contributions to piazza beyond paper responses: 5%
Regular participation in the in-class paper discussions: 3%
Helping to lead discussion during advocate/skeptic roles: 2%

I am happy to discuss situations that make it difficult for you to attend class. Please reach out to me.

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful response, and then read and carefully consider the responses submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and responding abilities.

Plan your reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Reach out for success and be proactive. There are no explicit office hours for this course. Email and schedule a time to chat with the instructor to discuss the course, the project, etc.

University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with your instructor or an academic advisor. Learn about how you can plan for success at: www.students.ubc.ca. For help addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit: https://students.ubc.ca/health-wellness

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper responses. Paper responses must be written individually. You are free to discuss the readings with other students, but write your responses on your own. Cite and attribute points from discussions with other students or external sources that you have read in your response.

Projects. You are free to use any code you find in your project. However, a non-trivial fraction of functionality in your prototype must be constructed by your team. You must cite and attribute sources of the code that you borrow/utilize in your project.

If you do discuss the project outside your team or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your project. Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your project.