Distributed Systems Abstractions

Distributed Systems Abstractions

538B, Fall 2021, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tue/Thu 3:30-5PM PST, in person, UBC course page

Course piazza, Course canvas

Office hours: by appointment


Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Unlike standard courses on distributed systems, this course will focus on abstractions. Dijkstra has a wonderful quote that gets to the heart of abstractions in computer science:

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

Powerful abstractions are a common theme in distributed systems (and really all software and hardware computing systems). This course will focus on a select number of abstractions that have persisted over time and continue to influence modern system designs. For example, some abstractions capture notions of coordination (consensus), others consider fault tolerance (replicated state machines) and consistency semantics (eventual consistency). These abstractions are worth studying because each one provides an accessible entry into the design of what are generally highly complex artifacts.

Course-level learning goals

By the end of this course participants will be able to
  • understand key principles involved in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • use abstractions to solve problems that arise in distributed contexts
  • prototype advanced distributed systems

In-person format

  • The discussion in each class will focus on 1-2 papers
  • Everyone must post a response for each of the assigned papers to Piazza, due at least 18 hours before class

Schedule (a work in progress)

Sep 9
Thu
Introductions, course overview, and academic paper reading
  1. How to Read a Paper, Srinivasan Keshav.
  2. Writing reviews for systems conferences, Timothy Roscoe.
Do not write responses for these readings.
Sep 14
Tue
Clocks and ordering of events
  1. Baldoni and Raynal. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 2002.
A1: Yanze
S1: Ben
Sep 16
Thu
Distributed snapshots
  1. Chandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System. TOCS 1985.
A1: Zack
S1: Alex
Sep 21
Tue
Distributed state
  1. Ousterhout. The Role of Distributed State. TR 1990.
A1:
S1:
Sep 23
Thu
Replicated state machines
  1. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. CSUR 1990.
Project speed dating in class.
A1:
S1:
Sep 28
Tue
Consensus: Paxos
  1. Lamport. Paxos Made Simple. 2001
  2. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006.
A1:
S1:

A2:
S2:
Sep 30
Thu
National Day for Truth and Reconciliation: UBC closed
Oct 5
Tue
Byzantine Fault Tolerance
  1. Castro and Liskov. Practical Byzantine Fault Tolerance
Project proposal drafts due October 5th at 18:00 PST
A1:
S1:
Oct 7
Thu
Distributed shared memory
  1. Feeley et al. Implementing global memory management in a workstation cluster
Related:
A1:
S1:
Oct 12
Tue
Programming frameworks: Argus
  1. Liskov. Distributed Programming in Argus. CACM 1988.
A1:
S1:
Oct 14
Thu
Pub-Sub primitive
  1. Eugster et al. The Many Faces of Publish/Subscribe
Finalized project proposals due October 15th at 18:00 PST
A1:
S1:
Oct 19
Tue
Language support
  1. Holt et al. Disciplined Inconsistency with Consistency Types. SoCC 2016.
A1:
S1:
Oct 21
Thu
Verification
  1. Wilcox et al. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. PLDI 2015.
A1:
S1:
Oct 26
Tue
Model checking
  1. Yang et al. MODIST: Transparent Model Checking of Unmodified Distributed Systems. NSDI 2009.
A1:
S1:
Oct 28
Thu
Tracing
  1. Sigelman et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. TR 2010.
First project milestone due October 29th at 18:00 PST
A1:
S1:
Nov 2
Tue
Dataflow: Spark
  1. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012.
A1:
S1:
Nov 4
Thu
Dataflow: TensorFlow
  1. Abadi et al. TensorFlow: A System for Large-Scale Machine Learning. OSDI 2016.
A1:
S1:
Nov 9
Tue
CAP theorem
  1. Gilbert and Lynch. Perspectives on the CAP Theorem. Computer J. 2012.
  2. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed
A1:
S1:

A2:
S2:
Nov 11
Thu
Fall midterm break

Second project milestone due November 15th at 18:00 PST
Email to schedule meetings with Ivan due November 15th at 18:00 PST
Nov 16
Tue
Weak consistency: eventual consistency and CRDTs
  1. Vogels. Eventually consistent
  2. Shapiro et al. Conflict-free Replicated Data Types. TR 2011.
A1:
S1:

A2:
S2:
Nov 18
Thu
Distributed Hash Tables
  1. Stoica et al. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. TON 2003.
A1:
S1:
Nov 23
Tue
P2P: BitTorrent and BitCoin
  1. Cohen. Incentives build robustness in BitTorrent
  2. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System
A1:
S1:

A2:
S2:
Nov 25
Thu
P2P: Ethereum
  1. Buterin. A Next-Generation Smart Contract and Decentralized Application Platform. 2013.
Third project milestone due November 29th at 18:00 PST
A1:
S1:
Nov 30
Tue
Privacy: Tor
  1. Dingledine et al. Tor: The Second-Generation Onion Router
  2. Christin. Traveling the silk road: a measurement analysis of a large anonymous online marketplace.
A1:
S1:

A2:
S2:
Dec 2
Tue

In-class project presentations

Dec 7
Tue

In-class project presentations

Final project reports due December 21st at 18:00 PST

Paper responses

For each of the assigned readings in the schedule above you must compose a 1-2 paragraph response. (The schedule sometimes lists optional readings; you do not need to respond to these). You should post your response on Piazza at least 18 hours before class in the thread with the title/date for the paper. See paper responses instructions for more information.

Everyone will have access to all the other students' response submissions. Please read them before class. Reading the other responses is a good way for you to gain perspective. You can see what you missed (or what other people missed), and whether you agree with their take on the key ideas. It will help to make the class sessions more productive.

The response will be graded using the following scale:

  • 1 : the response engages with the reading in some depth
  • 0 : no response was posted, or it was posted late, or the response is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Everyone must sign up to be an advocate/skeptic for a reading at least once during the term (likely 2-3 times depending on the final number of students in the class). If you are an advocate or a skeptic for the assigned reading, then you do not have to submit a response. See information about advocate and skeptic roles.

The advocate/skeptic roles will be graded using the following scale:

  • 1 : the advocate/skeptic helped to promote in-class discussion and illustrated in-depth understanding of the reading.
  • 0 : advocate/skeptic did not attend class, did not seem to read the assigned reading, or was entirely un-engaged during in-class discussion.

Project

The project must address a non-trivial problem relevant to distributed systems. The project can resolve the problem by building a system, by collecting data/carrying out experiments, by developing algorithms and proving them correct, etc. I strongly prefer that you do your project in a team of 2-4 people.

Here are some projects ideas (do not limit yourself to these!):

  • Build a fault-tolerant parallel computing platform.
  • Build a peer-to-peer version of DropBox.
  • Build a record-replay tool for distributed Go programs.
  • Build a distributed state assertions library for Go.
  • Build an anonymity system based on onion routing.
  • Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.
  • Formalize an existing implementation/proposal of a non-trivial routing, replication, fault tolerance, leader election or other distributed algorithm and prove that it is correct (or incorrect) in Coq.
  • Create a robust protocol fuzzing tool and apply it to popular distributed system prototypes.
  • Create a visualization tool that explains executions of distributed systems. Evaluate the tool on several existing systems.

I especially encourage project proposals that span other areas of computer science. Schedule a time to chat with me if you have an idea, or if you have a topic but need to translate it into a project idea.

The required project deliverables are listed below. In cases where the deliverable is a written paper, I would prefer that you share the doc with Ivan as an editable google doc. If you would rather use another submission approach, let me know. I would prefer final project reports in pdf format for an ACM SIG of your choice.

  • Project proposal draft: a paper outlining as much of the project proposal as possible. See proposal advice for more information on what is expected in the proposal draft. The draft does not need to have well-defined milestones.
  • Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information. The proposal must detail three well-defined milestones. Each milestone must include (1) deliverables that you will share with me for the milestone, (2) a written document that explains the deliverables and their status. The best way to think of the milestones is as a contract: if I accept your proposal and you meet the milestone you describe, then you will receive the full mark for the milestone.
  • 1st project milestone: 1st milestone deliverables and a document that explains the deliverables and their status. For example, if the deliverable is code, then the document must describe what the code does, under what conditions it works, how to run and deploy the code, etc. If the deliverable is a dataset, then the document must explain the format of the data, what it means, how it was collected, how complete it is, etc.
  • 2nd project milestone: 2nd milestone deliverables and a document that explains the deliverables and their status.
  • 3rd project milestone: 3rd milestone deliverables and a document that explains the deliverables and their status.
  • Project update meeting: Your project group will meet with me to discuss your project status. We will discuss the first two milestones, your progress towards the third milestones, and any outstanding questions/concerns that you might have.
  • Project presentation: a presentation describing your project. I encourage you to demo your project during your talk. We will have presentations across two days. Each presentation will be followed by a QA period. Presentation timing details TBD.

    All groups should provide every team member with an equal amount of time to speak during the presentation slot. I don't care who presents what in the talk, but (1) everyone must speak about the project, and (2) everyone must speak for an equal amount of time.

    During the QA, I hope to hear from everyone on the team, as well. But, in some cases (e.g., technical questions directed at a specific part of the project) a specific person may be the only one who can properly answer a question, and that's fine.

    You can compose your slides/presentation in whatever format you wish. You can even present a demo if you can. An example outline of a presentation that I would expect would look something like this (think technical talk):

    • 1. Motivation: what did you work on this topic/high-level view of the project topic (esp. in the context of this course)
    • 2. Background: anything that the audience needs to know to understand your talk
    • 3. Requirements/goals: what were you intending to build
    • 4. Design: what was the design of what you've build
    • 5. Implementation: any interesting implementation details
    • 6. Evaluation methodology: what questions were you intending to answer in the eval/what was the structure of your eval
    • 7. Evaluation results: results from your evaluation

  • Prototype implementation/experiments: must involve substantial development/experimental effort. The final prototype/data must be shared with Ivan preferably as a git repository at the same time as the project report.
  • Project report: a paper detailing the problem, your approach/solution, your prototype/experiments, and analysis/evaluation. The final report (due at the end of the term) should be no longer than 10 pages (excluding references) and should resemble a research paper. See final report instructions for more information.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

Timeline of project deliverables:

  • October 5: Project proposal drafts
  • October 15: Finalized project proposals
  • October 29: 1st project milestone
  • November 15: 2nd project milestone + email to schedule meeting with Ivan
  • November 29: 3rd project milestone
  • December 2,7: In-class project presentations
  • December 21 : Project reports and code

Grading

Final course mark will be based off of class participation, paper responses, and project deliverables.

  • Class/piazza participation: 10%
  • Paper responses: 30%
  • Project: 60%
    • Draft proposal: 5%
    • Final proposal: 10%
    • 1st milestone: 7%
    • 2nd milestone: 7%
    • 3rd milestone: 7%
    • Presentation: 4%
    • Final report and code: 20%

All members of the project team will receive the same mark for all of the project components. If you are experiencing difficulties with your team, please approach me as early as possible so that I can help.

The proposal, presentation, and final report/code are all required components. You cannot pass the course without having each of these. Specifically, you will receive a 0 for the project if any of these are missing.

The mark for class/piazza participation (10%) is based on four factors:

  • Regular course attendance and any contributions to piazza beyond paper responses: 5%
  • Regular participation in the in-class paper discussions: 3%
  • Helping to lead discussion during advocate/skeptic roles: 2%
I am happy to discuss situations that make it difficult for you to attend class. Please reach out to me.

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful response, and then read and carefully consider the responses submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and responding abilities.

Plan your reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Reach out for success and be proactive. There are no explicit office hours for this course. Email and schedule a time to chat with the instructor to discuss the course, the project, etc.

University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with your instructor or an academic advisor. Learn about how you can plan for success at: www.students.ubc.ca. For help addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit: https://students.ubc.ca/health-wellness

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper responses. Paper responses must be written individually. You are free to discuss the readings with other students, but write your responses on your own. Cite and attribute points from discussions with other students or external sources that you have read in your response.

Projects. You are free to use any code you find in your project. However, a non-trivial fraction of functionality in your prototype must be constructed by your team. You must cite and attribute sources of the code that you borrow/utilize in your project.

If you do discuss the project outside your team or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your project. Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your project.