Distributed Systems

Distributed Systems

538B, Fall 2016, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tues/Thur 11-12:30PM, DMP 101, UBC course page

Course piazza

Office hours by appointment


Course description

Distributed systems form the infrastructure for much of our daily computing experience. Popular internet services like Google search, Facebook, and Amazon are all implemented as distributed systems. Many of these systems have been repurposed to provide compute and storage as a cloud service to other companies, such as Airbnb. To bootstrap a tech startup today you simply pay for AWS or Azure to provide you with nearly unbounded and elastic capacity. Computer networks, from your home router to international ISPs, are also distributed systems: they are all in a constant state of distributed coordination. Even your multi-core laptop has much in common with a distributed system.

Being infrastructure, distributed systems are rarely in the limelight. The purpose of this course is to highlight these systems and the beauty behind their designs. I posit that to know the 'stack' and to engineer robust and high-performing systems today (and even more so in the future) requires familiarity with distributed systems that ensnare our computing devices.

This course will cover a broad range of topics. We will often look back to classic papers that introduced core concepts that structure many of the existing designs. We will also discuss contemporary papers that document the systems powering commercial services such as GMail.

Besides paper readings the first half of the term will include hands-on assignments. These will get you to apply the concepts from the readings to build small but practical distributed system prototypes of your very own. In the second half of the term the assignment will be replaced with an open-ended team project.

In this course we will be exclusively using the Go programming language for all assignments and the project. Go was designed with distributed systems in mind and you will come to appreciate this. Learning a new programming language is also an important skill and this course presents an opportunity to practice this skill. Note that Go will not be explicitly covered by the course. I assume that students can pick up the necessary language features as needed.

This course satisfies the PhD "Computer Systems and Design" breadth requirement.

Course-level learning goals

The course will provide an opportunity for participants to
  • understand key principles involved in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • become familiar with important techniques for solving problems that arise in distributed contexts
  • develop advanced distributed systems using the Go programming language

Format

This seminar-style course will be primarily driven through in-class discussion of the readings. The readings are available online and are linked to from the schedule. Most of the readings are research papers; there is no textbook. The course will also include periodic assignments that are to be done individually, and an open-ended course project that must be completed as part of a team.

If you are in the course, you should start with the following tasks:

  • Complete the Go tutorial and start learning Go.
  • Sign up for the class Piazza instance. We will use this for all class communication.

Schedule (a work in progress)

Sep 6
Tue
No class (grad orientation)
Sep 8
Thu
Introductions, course overview, and academic paper reading
  1. How to Read a Paper, Srinivasan Keshav.
  2. Writing reviews for systems conferences, Timothy Roscoe.
Do not write reviews for these readings.
Sep 13
Tue
Clocks and snapshots
  1. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System (read up to Anomalous Behavior section).
  2. Mandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System
Assignment 0 due by 6PM.

Other introductory readings:
A1: Daniel
S1: Stewart

A2: Kuba
S2: Jodi
Sep 15
Thu
Remote procedure calls
  1. Birrell and Nelson. Implementing Remote Procedure Calls
Assignment 1 due by 6PM.
A1: Amanda
S1: Ivan
Sep 20
Tue
Distributed state
  1. Ousterhout. The Role of Distributed State
A1: Clement
S1: Nick
Sep 22
Thu
State machine replication
  1. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial
In-class project speed-dating (come with ideas!)
A1: Dylan
S1: Al
Sep 27
Tue
Byzantine Fault Tolerance
  1. Lamport, Shostak, and Pease. The Byzantine Generals Problem
A1: Soheil
S1: Zipeng
Sep 29
Thu
Byzantine Fault Tolerance
  1. Castro and Liskov. Practical Byzantine Fault Tolerance
Related:
A1: Rain
S1: Joseph
Oct 4
Tue
Paxos consensus protocol
  1. Lamport. Paxos Made Simple
  2. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems
Project proposal drafts due by 6PM via email.

Related:
A1: Al
S1: Jodi

A2: Ivan
S2: Kuba
Oct 6
Thu
Raft consensus protocol
  1. Ongaro and Ousterhout. In Search of an Understandable Consensus Algorithm
Related:
A1: Kuba
S1: Zipeng
Oct 11
Tue
Distributed memory coherence
  1. Li and Hudak. Memory Coherence in Shared Virtual Memory Systems
A1: Rodrigo
S1: Soheil
Oct 13
Thu
Distributed shared memory

Ivan out of town (substitute: Mihir)

  1. Feeley et al. Implementing global memory management in a workstation cluster
Related:
A1: Zheng
S1: Amon
Oct 18
Tue
Programming frameworks
  1. Jul et al. Fine-grained mobility in the Emerald system
Project proposal due by 6PM via email.

Related:
A1: Joseph
S1: Stewart
Oct 20
Thu
  1. Eugster et al. The Many Faces of Publish/Subscribe
Assignment 2 due by 6PM via email.
A1: Daniel
S1: Rodrigo
Oct 25
Tue
Distributed Hash Tables
  1. Stoica et al. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
  2. Zave. Using Lightweight Modeling To Understand Chord
Related:
A1: Rain
S1: Zipeng

A2: Nick
S2: Joseph
Oct 27
Thu
P2P: BitTorrent
  1. Cohen. Incentives build robustness in BitTorrent
  2. Levin et al. BitTorrent is an Auction: Analyzing and Improving BitTorrent's Incentives
A1: Jodi
S1: Kuba

A2: Zipeng
S2: Clement
Nov 1
Tue
CAP theorem
  1. Gilbert and Lynch. Perspectives on the CAP Theorem
  2. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed
A1: Amanda
S1: Dylan

A2: Zheng
S2: Amon
Nov 3
Thu
  1. Corbett et al. Spanner: Google's Globally-Distributed Database
A1: Amanda
S1: Soheil
Nov 8
Tue
Weak consistency
  1. Demers et al. Epidemic algorithms for replicated database maintenance
  2. Vogels. Eventually consistent
Team emails to schedule a meeting are due by 6PM.

Related:
A1: Amon
S1: Rodrigo

A2: Daniel
S2: Zheng
Nov 10
Thu
CRDTs
  1. Shapiro et al. Conflict-free Replicated Data Types
Related:
A1: Stewart
S1: Clement
Nov 15
Tue
Tor

Ivan out of town (substitute: Patrick)

  1. Dingledine et al. Tor: The Second-Generation Onion Router
  2. Murdoch and Danezis. Low-Cost Traffic Analysis of Tor

  3. Assignment 3 due on Nov 16 by 6PM.
A1: Rain
S1: Jodi

A2: Amon
S2: Dylan
Nov 17
Thu
MapReduce

Ivan out of town (substitute: Nodir)

  1. Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters
  2. DeWitt and Stonebraker. MapReduce: A major step backwards
Write one review for both readings.

Related:
A1: Nick
S1: Al
Nov 22
Tue
Debugging
  1. Aguilera et al. Performance debugging for distributed systems of black boxes
  2. Sigelman et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
A1: Nick
S1: Zheng

A2: Amanda
S2: Dylan
Nov 24
Thu
  1. Yang et al. MODIST: Transparent Model Checking of Unmodified Distributed Systems.
A1: Joseph
S1: Stewart
Nov 29
Tue
Knowledge in a distributed environment
  1. Halpern and Moses. Knowledge and Common Knowledge in a Distributed Environment
A1: Soheil
S1: Al
Dec 1
Thu
P2P: BitCoin
  1. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System
  2. Nielsen. How the Bitcoin protocol actually works (2-column pdf version)
Write one review for both readings.
A1: Rain
S1: Clement
Dec 6
Tue
Project presentations 11AM-1PM
Dec 13
Tue
Project code and reports due by 6PM

Paper reviews

For each of the assigned readings in the schedule above you must compose a review/critique. (The schedule sometimes lists optional readings; you do not need to review these.) Reviews must be posted to HotCRP by 6PM the day before the class in which they are discussed.

Reviews should engage with the reading at a deep level. Write you review with an eye towards contributing something interesting to the discussion in class the following day. See paper reviews advice for more information.

Reviews are graded using 1 binary scale:

  • 1 : the review engages with the reading in some depth
  • 0 : no review was posted, or the review is unreadable, off topic, or offers no insight into the reading and/or is a summary of the reading.

Late reviews will not be accepted.

Assignments

There are three assignments. All assignments that require programming must be completed in Go (I will use Go 1.5+ to compile your code). Each student must work independently and submit their own solution source code (see collaboration guidelines at the bottom of this page).

Solution must be submitted to the instructor as an email attachment by 6PM of the day of the deadline. For this email use the subject line template "538B-2016w1 Assignment X: Name", where X is the assignment number and Name is the student's last name. Special instructions for compiling/running the code should be included as a README file or in the body of the email. The attachment can be a .go file or an archive (e.g., .zip or .tgz).

Assignments will be marked based on functionality: coding style and solution approach will not be marked. Full marks will be given to assignments that behave according to the given specifications. Partial marks will be given to assignments that only partially fulfill the specifications.

Assignment deadlines are listed in the schedule above and below. Assignment descriptions will be linked to from this page once they are available.

A late assignments will lose 20% of its mark for every 12 hour period that the assignment is late.

Project

The project must address a non-trivial problem relevant to distributed systems. The project must include a substantial software effort in Go and must be done in a team of 2 or 3 students. As well, the project must:

  • Use GoVector library for maintaining vector clocks
  • Produce textual logs that can be visualized using the ShiViz tool
  • Utilize the Dinv tool and includes a set of scripts to induce a set of executions (that run on a single machine) for which Dinv infers at least one correctness invariant for your system.

Here are some projects ideas (do not limit yourself to these!):

  • Build a fault-tolerant parallel computing platform.
  • Build a peer-to-peer version of DropBox.
  • Build a record-replay tool for distributed Go programs.
  • Build a distributed state assertions library for Go.
  • Build an anonymity system based on onion routing.
  • Build a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate its performance, availability, fault tolerance, etc.

There are four project deliverables:

  • Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information.
  • Prototype implementation: must be done in Go and must involve substantial development effort. The final prototype must be shared with the instructor (at the end of the term), preferably as an hg or git repository.
  • Project presentation: a 15 minute presentation describing your project. Each team member must present for about 5 minutes. The presentation will be followed by a 5 minute Q/A period. I encourage you to demo your project during the presentation.
  • Project report: a paper detailing the problem, your approach/solution, your prototype, and an evaluation of the prototype. Along with the report you should submit 2 logs that are compatible with ShiViz and which demonstrate an interesting execution of your system. Each log must come with a description of the execution and why it is interesting. The final report (due at the end of the term) should be no longer than 10 pages and should resemble a research paper.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 6PM on the day of the deadline.

  • Oct 04 : Project proposal drafts
  • Oct 18 : Project proposals
  • Nov 8 : Each team must send email to schedule a private meeting with instructor to discuss project status.
  • Dec 6 : Project presentations (15 minutes per team) during 11AM-1PM
  • Dec 13 : Project code and reports

Grading

Final course mark will be based off of:

  • Class participation: 20%
  • Paper reviews: 30%
  • Assignments: 10%
    • A1: 2%
    • A2: 4%
    • A3: 4%
  • Project: 40%
    • Proposal: 10% (team)
    • Presentation: 2.5% (team) + 2.5% (individual)
    • Final report: 25% (team)

Note that the project must be a team effort. The team's mark for the proposal and final report is the same for all team members. For project presentations each team member will receive a team mark and an individual mark.

The mark for class participation is based on three factors:

  • Regular course attendance
  • Regular participation in the in-class paper discussions
  • Leading of discussion for advocate/skeptic assignments

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful review, and then read and carefully consider the reviews submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and reviewing abilities.

Invest the time into learning Go. As with any new programming language, your proficiency in the language will highly correlate with the amount of code you write: practice counts for a lot!

Plan you reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Do not underestimate the importance of a thorough (and interesting!) project proposal. Proposal write-ups that are vague or are incomplete will not be accepted. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Be proactive. There are no explicit office hours for this course. Email and schedule a time to meet with the instructor to discuss the course, assignments, etc.

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper reviews. Paper reviews must be written individually. You are free to discuss the readings with other students, but write your reviews on your own. Cite and attribute points from discussions with other students or external sources that you have read in your review.

Assignments. You can talk with other students about the assignments. You cannot look at each other's code and you cannot share code. You can use Google and other resources to find code examples, but you should not rely on these directly by copy and pasting: you must implement your own solution from scratch. You can use standard Go libraries (as long as they don't directly solve the assignment for you); you cannot use any external libraries that are not distributed with Go.

If you do discuss the assignment with someone or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your assignment (or in body of email with your solution). Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your assignment solution.

Projects. The policy for project is more liberal than for assignments: you are free to use any code you find in your project (e.g., you can use external libraries that are not distributed with Go). However, a non-trivial fraction of functionality in your prototype must be constructed by your team. As with assignments, cite and attribute sources of the code that you borrow/utilize in your project.