Distributed Systems Abstractions

538B, Fall 2025, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Wed 11am-2pm PST, in person, SWING 207 (UBC Swing Space Building)

Office hours: by appointment

Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Unlike standard courses on distributed systems, this course will focus on abstractions. Dijkstra has a wonderful quote that gets to the heart of abstractions in computer science:

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

Powerful abstractions are a common theme in distributed systems (and really all software and hardware computing systems). This course will focus on a select number of abstractions that have persisted over time and continue to influence modern system designs. For example, some abstractions capture notions of coordination (consensus), others consider fault tolerance (replicated state machines) and consistency semantics (eventual consistency). These abstractions are worth studying because each one provides an accessible entry into the design of what are generally highly complex artifacts.

Course-level learning goals

By the end of this course participants will be able to

understand key principles involved in designing and implementing distributed systems
reason about problems that involve distributed components
use abstractions to solve problems that arise in distributed contexts
prototype advanced distributed systems

In-person format

The discussion in each class will focus on 2, and sometimes 3, papers
Everyone must post a response for each of the assigned papers to Piazza, due at least 24 hours before class
Attendance is mandatory, since we have few sessions with once per week class
Students are welcome to bring lunch and eat during class. We will also have a break in the middle.

Schedule (a work in progress)

Sep 3 Wed	Introductions, course overview, and academic paper reading How to Read a Paper, Srinivasan Keshav. Writing reviews for systems conferences, Timothy Roscoe. Do not write responses for these readings. Other introductory readings: Google. Introduction to Distributed System Design.
Sep 10 Wed	Ordering of events and distributed snapshots Baldoni and Raynal. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 2002. Chandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System. TOCS 1985.	A1: Zihan S1: Ard A2: Jiyeon S2: Oleg
Sep 17 Wed	Distributed state and replicated state machines Ousterhout. The Role of Distributed State. TR 1990. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. CSUR 1990. Project speed dating in class.	A/S 1: Shuo A/S 2: Partha
Sep 24 Wed	Consensus: Paxos Lamport. Paxos Made Simple. 2001 Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006. Related: Paxos Made Practical Paxos Made Live - An Engineering Perspective (More Chubby experiences) Hunt et al. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX ATC 2010. Tolerating latency in replicated state machines through client speculation, NSDI 2009.	A/S 1: Shuo A/S 2: Parshan
Oct 1 Wed	Byzantine Fault Tolerance Castro and Liskov. Practical Byzantine Fault Tolerance Yin et al. HotStuff: BFT Consensus in the Lens of Blockchain Related: Lamport, Shostak, and Pease. The Byzantine Generals Problem Project proposal drafts due October 1st at 18:00 PST	A/S 1: Ard A/S 2: Koh
Oct 8 Wed	Verification and model checking Wilcox et al. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. PLDI 2015. Leesatapornwongsa et al. SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems. OSDI 2014. Related: Hawblitzel et al. IronFleet: Proving Practical Distributed Systems Correct. SOSP 2015. Killian et al. Mace: Language Support for Building Distributed Systems. PLDI 2007. Yang et al. MODIST: Transparent Model Checking of Unmodified Distributed Systems. NSDI 2009. Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. NSDI 2007. Finalized project proposals due October 8th at 18:00 PST	A/S 1: Jiyeon A/S 2: Partha
Oct 15 Wed	Distributed tracing Sigelman et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. TR 2010. Zhang et al. The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems. NSDI 2023. Related: Kaldor et al. Canopy: An End-to-End Performance Tracing And Analysis System. SOSP 2017. Mace et al. Pivot Tracing: Dynamic CausalMonitoring for Distributed Systems. SOSP 2015. Fonseca et al. X-Trace: A Pervasive Network Tracing Framework. NSDI 2007.	A/S 1: Zihan A/S 2: Oleg
Oct 22 Wed	Bridging design and implementation Hackett et al. Compiling Distributed System Models with PGo. ASPLOS 2023. Hackett et al. TraceLinking Implementations with their Verified Designs. OOPSLA 2025. Related: Liskov. Distributed Programming in Argus. CACM 1988. First project milestone due October 22nd at 18:00 PST	A/S 1: Koh A/S 2: Jiyeon
Oct 29 Wed	Data processing: Spark and TensorFlow Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012. Abadi et al. TensorFlow: A System for Large-Scale Machine Learning. OSDI 2016. Related: Scaling distributed machine learning with the Parameter Server. OSDI 2014. Related: Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004. Scalability! But at what COST?	A/S 1: A/S 2:
Nov 5 Wed	CAP theorem Gilbert and Lynch. Perspectives on the CAP Theorem. Computer J. 2012. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed Related Corbett et al. Spanner: Google's Globally-Distributed Database. OSDI 2012.	None
Nov 12 Wed	*Fall midterm break* Second project milestone due November 12th at 18:00 PST
Nov 19 Wed	Eventual consistency and DHTs No class on Nov 19 Optional discussion on Nov 12 (see Piazza) Vogels. Eventually consistent Shapiro et al. Conflict-free Replicated Data Types. TR 2011. Related: Stoica et al. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. TON 2003. Kademlia: A P2P Information System Based on the XOR Metric Using Lightweight Modeling To Understand Chord Video: Strong Eventual Consistency and Conflict-free Replicated Data Types Readings in conflict-free replicated data types Demers et al. Epidemic algorithms for replicated database maintenance Eventual Consistency Today: Limitations, Extensions, and Beyond	None
Nov 26 Wed	P2P: BitTorrent, BitCoin, Ethereum Cohen. Incentives build robustness in BitTorrent Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System Nielsen. How the Bitcoin protocol actually works (2-column pdf version) Buterin. A Next-Generation Smart Contract and Decentralized Application Platform. 2013. Related: Levin et al. BitTorrent is an Auction: Analyzing and Improving BitTorrent's Incentives Dwork and Naor. Pricing via Processing or Combatting Junk Mail Third project milestone due November 26th at 18:00 PST	A/S 1: A/S 2: A/S 3:
Dec 3 Wed	Privacy: Tor Dingledine et al. Tor: The Second-Generation Onion Router Christin. Traveling the silk road: a measurement analysis of a large anonymous online marketplace.	A/S 1: A/S 2:
TBD	Project presentations Final project reports due December 21st at 18:00 PST

Paper responses

For each of the assigned readings in the schedule above you must compose a 1-2 paragraph response. (The schedule sometimes lists optional readings; you do not need to respond to these). You should post your response on Piazza at least 24 hours before class in the thread with the title/date for the paper. See paper responses instructions for more information.

Everyone will have access to all the other students' response submissions. Please read them before class. Reading the other responses is a good way for you to gain perspective. You can see what you missed (or what other people missed), and whether you agree with their take on the key ideas. It will help to make the class sessions more productive.

The response will be graded using the following scale:

1 : the response engages with the reading in some depth
0 : no response was posted, or it was posted late, or the response is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Everyone will be assigned to be an advocate/skeptic for a reading at least once during the term (likely 2-3 times depending on the final number of students in the class). If you are an advocate or a skeptic for the assigned reading, then you do not have to submit a response. See information about advocate and skeptic roles.

The advocate/skeptic roles will be graded using the following scale:

1 : the advocate/skeptic helped to promote in-class discussion and illustrated in-depth understanding of the reading.
0 : advocate/skeptic did not attend class, did not seem to read the assigned reading, or was entirely un-engaged during in-class discussion.

Project

The project must address a non-trivial problem relevant to distributed systems. The project can resolve the problem by building a system, by collecting data/carrying out experiments, by developing algorithms and proving them correct, etc. I strongly prefer that you do your project in a team of 2-3 people.

Here are some projects ideas (do not limit yourself to these!):

Build a fault-tolerant parallel computing platform.
Build a peer-to-peer version of DropBox.
Build a record-replay tool for distributed Go programs.
Build a distributed state assertions library for Go.
Extend Go's synctext (for concurrency testing) to distributed systems.
Build an anonymity system based on onion routing.
Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.
Formalize an existing implementation/proposal of a non-trivial routing, replication, fault tolerance, leader election or other distributed algorithm and prove that it is correct (or incorrect) in Rocq/TLA+/F*/etc.
Create a robust protocol fuzzing tool and apply it to popular distributed system prototypes.
Create a visualization tool that explains executions of distributed systems. Evaluate the tool on several existing systems.

I especially encourage project proposals that span other areas of computer science. Schedule a time to chat with me if you have an idea, or if you have a topic but need to translate it into a project idea.

The required project deliverables are listed below. In cases where the deliverable is a written paper, I would prefer that you share the doc with Ivan as an editable google doc. If you would rather use another submission approach, let me know. I would prefer final project reports in pdf format for an ACM SIG of your choice.

Project proposal draft: a paper outlining as much of the project proposal as possible. See proposal advice for more information on what is expected in the proposal draft. The draft does not need to have well-defined milestones.
Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information. The proposal must detail three well-defined milestones. Each milestone must include (1) deliverables that you will share with me for the milestone, (2) a written document that explains the deliverables and their status. The best way to think of the milestones is as a contract: if I accept your proposal and you meet the milestone you describe, then you will receive the full mark for the milestone.
1st project milestone: 1st milestone deliverables and a document that explains the deliverables and their status. For example, if the deliverable is code, then the document must describe what the code does, under what conditions it works, how to run and deploy the code, etc. If the deliverable is a dataset, then the document must explain the format of the data, what it means, how it was collected, how complete it is, etc.
2nd project milestone: 2nd milestone deliverables and a document that explains the deliverables and their status.
3rd project milestone: 3rd milestone deliverables and a document that explains the deliverables and their status.
Project update meeting: Your project group will meet with me to discuss your project status. We will discuss the first two milestones, your progress towards the third milestones, and any outstanding questions/concerns that you might have.
Project presentation: a presentation describing your project. I encourage you to demo your project during your talk. We will have presentations across two days. Each presentation will be followed by a QA period. Presentation timing details TBD.
All groups should provide every team member with an equal amount of time to speak during the presentation slot. I don't care who presents what in the talk, but (1) everyone must speak about the project, and (2) everyone must speak for an equal amount of time.

During the QA, I hope to hear from everyone on the team, as well. But, in some cases (e.g., technical questions directed at a specific part of the project) a specific person may be the only one who can properly answer a question, and that's fine.

You can compose your slides/presentation in whatever format you wish. You can even present a demo if you can. An example outline of a presentation that I would expect would look something like this (think technical talk):
- 1. Motivation: what did you work on this topic/high-level view of the project topic (esp. in the context of this course)
- 2. Background: anything that the audience needs to know to understand your talk
- 3. Requirements/goals: what were you intending to build
- 4. Design: what was the design of what you've build
- 5. Implementation: any interesting implementation details
- 6. Evaluation methodology: what questions were you intending to answer in the eval/what was the structure of your eval
- 7. Evaluation results: results from your evaluation
Prototype implementation/experiments: must involve substantial development/experimental effort. The final prototype/data must be shared with Ivan preferably as a git repository at the same time as the project report.
Project report: a paper detailing the problem, your approach/solution, your prototype/experiments, and analysis/evaluation. The final report (due at the end of the term) should be no longer than 10 pages (excluding references) and should resemble a research paper. See final report instructions for more information.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

Timeline of project deliverables:

October 1: Project proposal drafts
October 8: Finalized project proposals
October 22: 1st project milestone
November 12: 2nd project milestone + email to schedule meeting with Ivan
November 26: 3rd project milestone
December 10: Project presentations
December 21 : Project reports and code

Grading

Final course mark will be based off of class participation, paper responses, and project deliverables.

Class/piazza participation: 10%
Paper responses: 30%
Project: 60%

Draft proposal: 5%
Final proposal: 10%
1st milestone: 7%
2nd milestone: 7%
3rd milestone: 7%
Presentation: 4%
Final report and code: 20%

All members of the project team will receive the same mark for all of the project components. If you are experiencing difficulties with your team, please approach me as early as possible so that I can help.

The proposal, presentation, and final report/code are all required components. You cannot pass the course without having each of these. Specifically, you will receive a 0 for the project if any of these are missing.

The mark for class/piazza participation (10%) is based on four factors:

Regular course attendance: 5%
Regular participation in the in-class paper discussions: 2.5%
Helping to lead discussion during advocate/skeptic roles: 2.5%

I am happy to discuss situations that make it difficult for you to attend class or participate in the in-class paper discussions. Please reach out to me.

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful response, and then read and carefully consider the responses submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and responding abilities. Reading/skimming the related papers listed in the schedule is optional but can be helpful for further perspective on the topic.

Plan your reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Reach out for success and be proactive. There are no explicit office hours for this course. Email and schedule a time to chat with the instructor to discuss the course, the project, etc.

University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with your instructor or an academic advisor. Learn about how you can plan for success at: www.students.ubc.ca. For help addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit: https://students.ubc.ca/health-wellness

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper responses. Paper responses must be written individually. You are free to discuss the readings with other students, but write your responses on your own. Cite and attribute points from discussions with other students or external sources that you have read in your response.

Projects. You are free to use any code you find in your project. However, a non-trivial fraction of functionality in your prototype must be constructed by your team. You must cite and attribute sources of the code that you borrow/utilize in your project.

If you do discuss the project outside your team or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your project. Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your project.

AI usage

Generative AI tools are becoming part of the modern computing landscape. In this course, you may use such tools, but with important caveats:

Reading responses: These short written reflections are intended to help you develop your own critical perspective on research ideas. To ensure you get the most benefit, you should not use generative AI to draft or edit your responses. Your writing should be entirely your own.
Understanding readings and concepts: You may use AI tools to help clarify difficult concepts or summarize material, but keep in mind that these tools sometimes generate misleading or incorrect explanations. Always cross-validate what you learn from AI with the papers themselves, lecture material, or other reliable sources.
Course project: AI tools may be useful for implementation (e.g., generating boilerplate code, experimenting with API calls, debugging). Their use is encouraged, provided that:
- You fully understand any code you submit.
- You have processes in place (testing, code review, reasoning about correctness) to validate AI-generated output.
- You document in your final project report how you used AI tools (e.g., example prompts, their role in your workflow).

General notes:
Using AI tools does not replace your own learning. Over-reliance on them will hurt your ability to perform on exams (in other courses) and do great research. When in doubt: write yourself, and always be transparent about your use of AI.