Distributed Systems Abstractions

538B, Fall 2020, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tue/Thu 8-930AM PST, Online, UBC course page

Office hours by appointment

Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Unlike standard courses on distributed systems, this course will focus on abstractions. Dijkstra has a wonderful quote that gets to the heart of abstractions in computer science:

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

Powerful abstractions are a common theme in distributed systems (and really all software and hardware computing systems). This course will focus on a select number of abstractions that have persisted over time and continue to influence modern system designs. For example, some abstractions capture notions of coordination (consensus), others consider fault tolerance (replicated state machines) and consistency semantics (eventual consistency). These abstractions are worth studying because each one provides an accessible entry into the design of what are generally highly complex artifacts.

Course-level learning goals

By the end of this course participants will be able to

understand key principles involved in designing and implementing distributed systems
reason about problems that involve distributed components
use abstractions to solve problems that arise in distributed contexts
prototype advanced distributed systems

Online format

The course will be held entirely online in a synchronous/live format. That is, students and the instructor will meet each week at the same time online as if we were meeting for a physical class. This format is adopted due to the seminar-style nature of the course. The course is driven through in-class discussions of the readings and a live discussion is the best way to further our understanding as a group.

The readings are available online and are linked to from the schedule below (a work in progress). Most of the readings are research papers; there is no textbook. The course will also include a course project (the bulk of your course mark).

The discussion in each class will focus on 1-2 papers
Everyone must post a response for each of the assigned papers to Piazza, due at least 18 hours before class

If you are in the course, you should start with the following tasks:

Figure out your local setup so that you can participate in online zoom sessions.
Log onto UBC Canvas, which lists all zoom info.
Sign up for the class Piazza instance. We will use this for async class communication.

Schedule (a work in progress)

Sep 10 Thu	Introductions, course overview, and academic paper reading [slides] How to Read a Paper, Srinivasan Keshav. Writing reviews for systems conferences, Timothy Roscoe. Do not write responses for these readings. Other introductory readings: Google. Introduction to Distributed System Design.
Sep 15 Tue	Clocks and ordering of events [class notes] Baldoni and Raynal. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 2002.	A: Shiqi S: Mayank
Sep 17 Thu	Distributed snapshots [class notes] Chandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System. TOCS 1985.	A: Lucca S: Fangyu
Sep 22 Tue	Distributed state [class notes] Ousterhout. The Role of Distributed State. TR 1990.	A: Finn S: Joseph
Sep 24 Thu	Replicated state machines [class notes] Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. CSUR 1990.	A: Sasha S: ---
Sep 29 Tue	Paxos [class notes] Lamport. Paxos Made Simple. 2001 Related: Paxos Made Practical	A1:Ali S1:Jianyu
Oct 1 Thu	Coordination services [class notes] Hunt et al. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX ATC 2010. Related: Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006. Paxos Made Live - An Engineering Perspective (More Chubby experiences) Tolerating latency in replicated state machines through client speculation, NSDI 2009. Project speed dating in class via zoom breakout rooms.	A1:Lucca S1:Finn
Oct 6 Tue	Byzantine Fault Tolerance [class notes] Castro and Liskov. Practical Byzantine Fault Tolerance Related: Lamport, Shostak, and Pease. The Byzantine Generals Problem	A1:Fangyu S1:Ali
Oct 8 Thu	Remote procedure calls [class notes] Birrell and Nelson. Implementing Remote Procedure Calls Project proposal drafts due October 9th at 18:00 PST	A1:Jerry S1:Noa
Oct 13 Tue	Programming frameworks: Argus [class notes] Liskov. Distributed Programming in Argus. CACM 1988.	A1:Finn S1:Jianyu
Oct 15 Thu	Programming frameworks: distributed objects [class notes] Jul et al. Fine-grained mobility in the Emerald system. TOCS 1988. Finalized project proposals due October 16th at 18:00 PST	A1:Jerry S1:Sasha
Oct 20 Tue	Language support [class notes] Killian et al. Mace: Language Support for Building Distributed Systems. PLDI 2007.	A1:Mayank S1:Shiqi
Oct 22 Thu	Verification [class notes] Wilcox et al. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. PLDI 2015. Related: Hawblitzel et al. IronFleet: Proving Practical Distributed Systems Correct. SOSP 2015.	A1:Sasha S1:Junfeng
Oct 27 Tue	Model checking [class notes] Yang et al. MODIST: Transparent Model Checking of Unmodified Distributed Systems. NSDI 2009. Related: Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. NSDI 2007.	A1:Junfeng S1:Noa
Oct 29 Thu	Tracing [class notes] Sigelman et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. TR 2010.	A1:Joseph S1:Lucca
Nov 3 Tue	Spark [class notes] Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012. Related: Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004. Scalability! But at what COST?	A1:Mayank S1:Shiqi
Nov 5 Thu	TensorFlow [class notes] Abadi et al. TensorFlow: A System for Large-Scale Machine Learning. OSDI 2016. Related: Scaling distributed machine learning with the Parameter Server. OSDI 2014.	A1:Lucca S1:Mayank
Nov 10 Tue	CAP theorem [class notes] Gilbert and Lynch. Perspectives on the CAP Theorem. Computer J. 2012. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed Related Corbett et al. Spanner: Google's Globally-Distributed Database. OSDI 2012.	A1:Mahyar S1:Niusha A2:Mahyar S2:Niusha
Nov 12 Thu	Weak consistency: CRDTs [class notes] Shapiro et al. Conflict-free Replicated Data Types. TR 2011. Related: Video: Strong Eventual Consistency and Conflict-free Replicated Data Types Readings in conflict-free replicated data types Project updates + email to schedule meetings with Ivan due November 13th at 18:00 PST	A1:Sasha S1:Junfeng
Nov 17 Tue	Weak consistency: optimistic replication [class notes] Saito and Shapiro. Optimistic replication. CSUR 2005. Related: Vogels. Eventually consistent Demers et al. Epidemic algorithms for replicated database maintenance Eventual Consistency Today: Limitations, Extensions, and Beyond	A1:Noa S1:Joseph
Nov 19 Thu	Distributed Hash Tables [class notes] Stoica et al. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. TON 2003. Related: Kademlia: A P2P Information System Based on the XOR Metric Using Lightweight Modeling To Understand Chord	A1:Niusha S1:Mahyar
Nov 24 Tue	P2P: BitTorrent and BitCoin [class notes] Cohen. Incentives build robustness in BitTorrent Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System Nielsen. How the Bitcoin protocol actually works (2-column pdf version) Related: Levin et al. BitTorrent is an Auction: Analyzing and Improving BitTorrent's Incentives Dwork and Naor. Pricing via Processing or Combatting Junk Mail	A1:Fangyu S1:Mahyar A2:Shiqi S2:Fangyu
Nov 26 Thu	Permissioned blockchains [class notes] Androulaki et al. Hyperledger fabric: a distributed operating system for permissioned blockchains.	A1:Ali S1:Jianyu
Dec 1 Tue	In-class project presentations
Dec 3 Thu	In-class project presentations
Dec 11	Final project reports and code due at 18:00 PST

Paper responses

For each of the assigned readings in the schedule above you must compose a 1-2 paragraph response. (The schedule sometimes lists optional readings; you do not need to respond to these). You should post your response on Piazza at least 18 hours before class in the thread with the title/date for the paper. See paper responses instructions for more information.

Everyone will have access to all the other students' response submissions. Please read them before class. Reading the other responses is a good way for you to gain perspective. You can see what you missed (or what other people missed), and whether you agree with their take on the key ideas. It will help to make the class sessions more productive.

The response will be graded using the following scale:

1 : the response engages with the reading in some depth
0 : no response was posted, or it was posted late, or the response is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Everyone must sign up to be an advocate/skeptic for a reading at least once during the term (likely 2-3 times depending on the final number of students in the class). If you are an advocate or a skeptic for the assigned reading, then you do not have to submit a response. See information about advocate and skeptic roles.

The advocate/skeptic roles will be graded using the following scale:

1 : the advocate/skeptic helped to promote in-class discussion and illustrated in-depth understanding of the reading.
0 : advocate/skeptic did not attend class, did not seem to read the assigned reading, or was entirely un-engaged during in-class discussion.

Project

The project must address a non-trivial problem relevant to distributed systems. The project can resolve the problem by building a system, by collecting data/carrying out experiments, by developing algorithms and proving them correct, etc. I strongly prefer that you do your project in a team of 2-3 people, but this is not a strict requirement considering that you cannot collaborate in person.

Here are some projects ideas (do not limit yourself to these!):

Build a fault-tolerant parallel computing platform.
Build a peer-to-peer version of DropBox.
Build a record-replay tool for distributed Go programs.
Build a distributed state assertions library for Go.
Build an anonymity system based on onion routing.
Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.

I especially encourage project proposals that span other areas of computer science. Schedule a time to chat with me if you have an idea, or if you have a topic but need to translate it into a project idea.

The required project deliverables are listed below. In cases where the deliverable is a written paper, I would prefer that you share the doc with Ivan as an editable google doc. If you would rather use another submission approach, let me know. I would prefer final project reports in pdf format for an ACM SIG of your choice.

Project proposal draft: a paper outlining as much of the project proposal as possible. This draft is required, but it will not be marked. However, if you do not submit a proposal draft, your proposal draft mark goes down by 50%. See proposal advice for more information.
Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information.
Project update: a maximum two page update on your project: what you have done so far, initial result, what remains to be done, unexpected challenges you have encountered, etc. This update will define the agenda for your project update meeting with Ivan.
Project update meeting: Your project group will meet with me to discuss your project status. We will focus on the project update, and any outstanding questions that you might have.
Project presentation: a presentation describing your project. I encourage you to demo your project (if you built a system) during your talk. We will have presentations across two days: December 1st and 3rd. We have 9 groups, so 5 groups will present on the 1st and 4 groups will present on the 3rd.
Every group will have 17 minutes. This 17 minutes will be split into 12 minutes for the talk and 5 minutes for the questions/answer period. I will stop you at 12 minutes so that we can proceed with the QA. The 12 minutes divides evenly by 1, 2, 3, and 4. All groups should provide every team member with an equal amount of time to speak during the presentation slot. I don't care who presents what in the talk, but (1) everyone must speak about the project, and (2) everyone must speak for an equal amount of time.

During the QA, I hope to hear from everyone on the team, as well. But, in some cases (e.g., technical questions directed at a specific part of the project) a specific person may be the only one who can properly answer a question, and that's fine.

You will be presenting via zoom/share screen. So, you can compose your slides/presentation in whatever format you wish. You can even present a demo, although 12 minutes doesn't allow for much time. An example outline of a presentation that I would expect would look something like this (think technical talk):
- 1. Motivation: what did you work on this topic/high-level view of the project topic (esp. in the context of this course)
- 2. Background: anything that the audience needs to know to understand your talk
- 3. Requirements/goals: what were you intending to build
- 4. Design: what was the design of what you've build
- 5. Implementation: any interesting implementation details
- 6. Evaluation methodology: what questions were you intending to answer in the eval/what was the structure of your eval
- 7. Evaluation results: results from your evaluation
Prototype implementation/experiments: must involve substantial development/experimental effort. The final prototype/data must be shared with Ivan preferably as a git repository at the same time as the project report.
Project report: a paper detailing the problem, your approach/solution, your prototype/experiments, and analysis/evaluation. The final report (due at the end of the term) should be no longer than 10 pages and should resemble a research paper. See final report instructions for more information.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

Timeline of project deliverables:

October 9th: Project proposal drafts
October 16th: Finalized project proposals
November 13th: Project updates + email to schedule meetings with Ivan
December 1st and 3rd: In-class project presentations (about 15 minutes per group)
December 11th : Project reports and code

Grading

Final course mark will be based off of class participation, paper responses, and project deliverables.

Class participation: 10%
Paper responses: 30%
Project: 60%

Proposal draft: unmarked, but required
Proposal: 20% (team)
Update: 10% (team)
Presentation: 5% (team) + 5% (individual)
Final report and code: 20% (team)

Note that the team's mark for the proposal and final report is the same for all team members. For project presentations each team member will receive a team mark and an individual mark.

The mark for class participation (10%) is based on three factors:

Regular course attendance: 5%
Regular participation in the in-class paper discussions: 3%
Leading discussion during advocate/skeptic roles: 2%

Note that I am willing to discuss situations that make it difficult for you to attend the class. Please reach out to me.

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful response, and then read and carefully consider the responses submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and responding abilities.

Plan your reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Reach out for success and be proactive. There are no explicit office hours for this course. Email and schedule a time to chat with the instructor to discuss the course, the project, etc.

University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with your instructor or an academic advisor. Learn about how you can plan for success at: www.students.ubc.ca. For help addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit: https://students.ubc.ca/health-wellness

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper responses. Paper responses must be written individually. You are free to discuss the readings with other students, but write your responses on your own. Cite and attribute points from discussions with other students or external sources that you have read in your response.

Projects. You are free to use any code you find in your project. However, a non-trivial fraction of functionality in your prototype must be constructed by your team. You must cite and attribute sources of the code that you borrow/utilize in your project.

If you do discuss the project outside your team or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your project. Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your project.

Considerations in online participation from outside of Canada

During this pandemic, the shift to online learning has greatly altered teaching and studying at UBC, including changes to health and safety considerations. Keep in mind that some UBC courses might cover topics that are censored or considered illegal by non-Canadian governments. This may include, but is not limited to, human rights, representative government, defamation, obscenity, gender or sexuality, and historical or current geopolitical controversies. If you are a student living abroad, you will be subject to the laws of your local jurisdiction, and your local authorities might limit your access to course material or take punitive action against you. UBC is strongly committed to academic freedom, but has no control over foreign authorities (please visit this page for an articulation of the values of the University conveyed in the Senate Statement on Academic Freedom). Thus, we recognize that students will have legitimate reason to exercise caution in studying certain subjects. If you have concerns regarding your personal situation, consider postponing taking a course with manifest risks, until you are back on campus or reach out to your academic advisor to find substitute courses. For further information and support, please visit this page.