Distributed Systems

Distributed Systems

538B, Winter 2019, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tue/Thu 930-11AM, MacLeod 220, UBC course page

Course piazza

Office hours by appointment


Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems that ensnare our computing devices.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Besides paper readings the first half of the term will include hands-on assignments. These will get you to apply the concepts from the readings to build small but practical distributed system prototypes of your very own. In the second half of the term the assignment will be replaced with an open-ended project.

In this course we will be exclusively using the Go programming language for all assignments and the project. Go was designed with distributed systems in mind and you will come to appreciate this. Learning a new programming language is also an important skill and this course presents an opportunity to practice this skill. Note that Go will not be explicitly covered by the course. I assume that students can pick up the necessary language features as needed.

This course satisfies the PhD "Computer Systems and Design" breadth requirement.

Course-level learning goals

The course will provide an opportunity for participants to
  • understand key principles involved in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • become familiar with important techniques for solving problems that arise in distributed contexts
  • develop advanced distributed systems using the Go programming language

Format

This seminar-style course will be primarily driven through in-class discussion of the readings. The readings are available online and are linked to from the schedule below (a work in progress). Most of the readings are research papers; there is no textbook. The course will also include an open-ended course project (the bulk of your course mark) that must be completed as part of a team. As well, there will be a take-home final exam.

  • The discussion in each class will focus on 1-2 papers
  • Everyone must post a response for each of the assigned papers to Piazza, due at least 24 hours before class

If you are in the course, you should start with the following tasks:

  • Complete the Go tutorial and start learning Go.
  • Sign up for the class Piazza instance. We will use this for all class communication.

Schedule (a work in progress)

Jan 3
Thu
Introductions, course overview, and academic paper reading
  1. How to Read a Paper, Srinivasan Keshav.
  2. Writing reviews for systems conferences, Timothy Roscoe.
Do not write responses for these readings.

Other introductory readings:
Jan 8
Tue
Clocks and ordering of events
  1. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. CACM 1978. (read up to Anomalous Behavior section).
  2. Baldoni and Raynal. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 2002.
A1:Swati
S1:Hassan

A2:Chris
S2:Puneet
Jan 10
Thu
Distributed snapshots
  1. Chandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System. TOCS 1985.
A1:Matthew
S1:Christian
Jan 15
Tue
Distributed state
  1. Ousterhout. The Role of Distributed State
A1: Felix
S1: Swati
Jan 17
Thu
Replicated state machines
  1. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial
A1: X
S1: X
Jan 22
Tue
Distributed shared memory 1/2

(Ivan out of town; Nodir and Fabian lead)

  1. Li. IVY: A Shared Virtual Memory System for Parallel Computing. 1988.
  2. Feeley et al. Implementing global memory management in a workstation cluster. SOSP 1995.
Related:
A1:
S1:

A2:
S2:
Jan 24
Thu
Distributed shared memory 2/2

(Ivan out of town; Nodir and Fabian lead)

  1. Keleher et al. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. WTEC 1994.
A1:
S1:
Jan 29
Tue
Consensus theory
  1. Impossibility of Distributed Consensus with One Faulty Process (FLP result). 1985
  2. Ben-Or. Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols. PODC 1983.
A1:
S1:

A2:
S2:
Jan 31
Thu
Paxos
  1. Lamport. Paxos Made Simple. 2001
Related:
A1:
S1:
Feb 5
Tue
Paxos in deployed systems
  1. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006.
  2. Hunt et al. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX ATC 2010.
Related:
A1:
S1:

A2:
S2:
Feb 7
Thu
Raft and viewstamped replication
  1. Ongaro and Ousterhout. In Search of an Understandable Consensus Algorithm. USENIX 2014.
  2. Liskov. Viewstamped Replication Revisited. 2010
Related:
A1:
S1:

A2:
S2:
Feb 12
Tue
Byzantine Fault Tolerance
  1. Lamport, Shostak, and Pease. The Byzantine Generals Problem
  2. Castro and Liskov. Practical Byzantine Fault Tolerance
Related:
A1:
S1:

A2:
S2:

Paper responses

For each of the assigned readings in the schedule above you must compose a 1-2 paragraph response. (The schedule sometimes lists optional readings; you do not need to respond to these). You should post your response on Piazza at least 24 hours before class in the thread with the title/date for the paper.

See paper responses instructions for more information.

You will have access to all the other students' submissions. Please read them, because reading the other responses is a good way for you to get perspective. You can see what you missed (or what other people missed), and whether you agree with their take on the key ideas. It will help to make the class sessions more productive.

The response will be graded using the following scale:

  • 1 : the response engages with the reading in some depth
  • 0 : no response was posted, or it was posted late, or the response is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Late responses will not be accepted.

Assignments

TBD

Project

The project must address a non-trivial problem relevant to distributed systems. The project can resolve the problem by building a system, by collecting data/carrying out experiments, by developing algorithms and proving them correct, etc. I strongly prefer that you do your project in a team of 2-3 people, but this is not a strict requirement.

Here are some projects ideas (do not limit yourself to these!):

  • Build a fault-tolerant parallel computing platform.
  • Build a peer-to-peer version of DropBox.
  • Build a record-replay tool for distributed Go programs.
  • Build a distributed state assertions library for Go.
  • Build an anonymity system based on onion routing.
  • Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.

I especially encourage project proposals that span other areas of computer science. Meet and chat with me if you have an idea, or if you have a topic but need to translate it into a project idea.

There are four project deliverables:

  • Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information.
  • Prototype implementation/experiments: must involve substantial development/experimental effort. The final prototype/data must be shared with the instructor (at the end of the term), preferably as an hg or git repository.
  • Project presentation: a 30 minute presentation describing your project. I encourage you to demo your project (if you built a system) during your talk. Presentation details TBD.
  • Project report: a paper detailing the problem, your approach/solution, your prototype/experiments, and analysis/evaluation. The final report (due at the end of the term) should be no longer than 10 pages and should resemble a research paper. See final report instructions for more information.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

Final exam

Details TBD

Grading

Final course mark will be based off of:

  • Class participation: 10%
  • Paper responses: 20%
  • Assignment(s): 10%
  • Project: 50%
    • Proposal: 20% (team)
    • Presentation: 5% (team) + 5% (individual)
    • Final report: 20% (team)
  • Final exam: 10%

Note that the project must be a team effort. The team's mark for the proposal and final report is the same for all team members. For project presentations each team member will receive a team mark and an individual mark.

The mark for class participation is based on three factors:

  • Regular course attendance
  • Regular participation in the in-class paper discussions
  • Leading of discussion for advocate/skeptic assignments

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful response, and then read and carefully consider the responses submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and responseing abilities.

Invest the time into learning Go. As with any new programming language, your proficiency in the language will highly correlate with the amount of code you write: practice counts for a lot!

Plan you reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Do not underestimate the importance of a thorough (and interesting!) project proposal. Proposal write-ups that are vague or are incomplete will not be accepted. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Reach out for success and be proactive. There are no explicit office hours for this course. Email and schedule a time to meet with the instructor to discuss the course, the project, etc.

University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with your instructor or an academic advisor. Learn about how you can plan for success at: www.students.ubc.ca. For help addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit: https://students.ubc.ca/health-wellness

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper responses. Paper responses must be written individually. You are free to discuss the readings with other students, but write your responses on your own. Cite and attribute points from discussions with other students or external sources that you have read in your response.

Assignments. You can talk with other students about the assignments. You cannot look at each other's code and you cannot share code. You can use Google and other resources to find code examples, but you should not rely on these directly by copy and pasting: you must implement your own solution from scratch. You can use standard Go libraries (as long as they don't directly solve the assignment for you); you cannot use any external libraries that are not distributed with Go.

If you do discuss the assignment with someone or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your assignment (or in body of email with your solution). Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your assignment solution.

Projects. The policy for project is more liberal than for assignments: you are free to use any code you find in your project (e.g., you can use external libraries that are not distributed with Go). However, a non-trivial fraction of functionality in your prototype must be constructed by your team. As with assignments, cite and attribute sources of the code that you borrow/utilize in your project.