Distributed Systems

Distributed Systems

538B, Winter 2015, Ivan Beschastnikh (bestchai@cs.ubc.ca)

Tues/Thur 2-3:30PM, DMP 101, UBC course page

Office hours by appointment


Course description

Leslie Lamport, a computer scientist at MSR who won the 2013 ACM Turing Award, gave the following definition of a distributed system:

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

Yet, distribution provides numerous benefits. A system becomes more fault tolerant if there are fewer points of failure and it has no centralized components. By extending the system with more physical nodes the system gains performance and becomes more scalable, capable of handling more load. Distribution can also improve latency, by improving geographic diversity, by placing resources closer to clients who use the system.

Achieving these benefits is not easy. As the quote above illustrates, distributed systems can fail in complex ways and these systems are more difficult to build, test, and understand than centralized systems.

This course will cover a broad range of topics, grounded in classic papers that trail-blazed concepts like remote procedure calls, distributed consensus, disconnected operation, and many others. In addition to the readings the first half of the term will include hands-on assignments. These will get you to apply the concepts from the readings to build small but practical distributed system prototypes of your very own. In the second half of the term the assignment will be replaced with an open-ended team project.

In this course we will be exclusively using the Go programming language for all assignments and the project. Learning a new programming language is an important skill. You will practice it in this course. Note that Go will not be explicitly covered by the course. I assume that students can pick up the necessary language features as needed.

This course satisfies the PhD "Computer Systems and Design" breadth requirement.

Course-level learning goals

The course will provide an opportunity for participants to
  • understand key principles involved in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • become familiar with important techniques for solving problems that arise in distributed contexts
  • develop advanced distributed systems using the Go programming language

Format

This seminar-style course will be primarily driven through in-class discussion of the readings. The readings are available online and are linked to from the schedule. Most of the readings are research papers; there is no textbook. The course will also include periodic assignments that are to be done individually, and an open-ended course project that must be completed as part of a team.

  • The discussion in each class will focus on 1-2 papers
  • The discussion for each paper is led by two students, an advocate and a skeptic
  • Everyone except for the paper advocate and skeptic writes a review of each of the assigned papers, due at 9PM the day before class

Advocate and skeptic are in charge of leading the discussion for a paper. Do not use slides. Instead, give a concise summary of the reading and pose questions to dig deeper into why the paper was written, what can we take away from it, how does the paper relate to the course, why it is or is not the final word on the subject, etc.

Advocate and skeptic need to review any other paper(s) that are discussed on same day, but they do not need to write a review for the paper they are leading.

If you are in the course, you should start with the following three tasks:

  • Complete the Go tutorial and start learning Go.
  • Create an account on the HotCRP server. This is the system where you are responsible for entering reviews of the papers that we discuss in class.
  • Sign up for the class Piazza instance. We will use this for all class communication.

Schedule (a work in progress)

Jan 6
Tue
Introductions, course overview, and academic paper reading
  1. How to Read a Paper, Srinivasan Keshav.
  2. Writing reviews for systems conferences, Timothy Roscoe.
Do not write reviews for these readings.
Jan 8
Thu
Basic principles
  1. Google. Introduction to Distributed System Design
    In your paper review (on HotCRP) briefly respond to exercises 2,3,4.
Assignment 1 due by 9PM.
A1: Peter B.
S1: Peter C.
Jan 13
Tue
Clocks and snapshots
  1. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System
  2. Mandy and Lamport. Distributed Snapshots: Determining the Global States of a Distributed System
A1: Sam
S1: Wali

A2: Jonathan
S2: Nodir
Jan 15
Thu
Remote procedure calls
  1. Birrell and Nelson. Implementing Remote Procedure Calls
A1: Jeff
S1: Felipe
Jan 20
Tue
Transactions
  1. Bernstein, Hadzilacos, and Goodman. Chapter 7 (Distributed Recovery) in Concurrency Control and Recovery in Database Systems.
Focus on the design and properties of 2PC and 3PC protocols.
In-class project speed-dating (come with ideas!)

Assignment 2 due by 9PM.
A1: Sampoorna
S1: Kent
Jan 22
Thu
State machine replication
  1. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial
A1: Neil
S1: Glen
Jan 27
Tue
Cache coherence
  1. Ousterhout. The Role of Distributed State
  2. Anderson et al. Serverless Network File Systems
A1:Ravjot
S1:Peter

A2:Michael
S2:Nodir
Jan 29
Thu
Memory coherence
  1. Li et al. Memory Coherence in Shared Virtual Memory Systems
A1:Nabil
S1:Jeff
Feb 3
Tue
  1. Saltzer, Reed and Clark. End-to-End Arguments in System Design
  2. Cheriton and Skeen. Understanding the Limitations of Causally and Totally Ordered Multicast
Assignment 3 due by 9PM.
Optional: Birman's response to Cheriton and Skeen.
A1:Joyita
S1:Felipe

A2:Sampoorna
S2:Nodir
Feb 5
Thu
Programming frameworks
  1. Adya et al. Thialfi: A Client Notification Service for Internet-Scale Applications
A1:Giovanni
S1:Jeff
Feb 10
Tue
Programming frameworks
  1. Jul et al. Fine-grained mobility in the Emerald system
  2. Liskov. Distributed Programming in Argus
Project proposal drafts due by 9PM via email.
A1:Felipe
S1:Jonathan

A2:Shuochen
S2:Giovanni
Feb 12
Thu
Programming frameworks
  1. Eugster et al. The Many Faces of Publish/Subscribe
A1:Ravjot
S1:Joyita
Feb 17
Tue
No class (UBC midterm break)
Feb 19
Thu
No class (UBC midterm break)
Feb 24
Tue
Paxos in theory
  1. Fisher, Lynch, and Paterson. Impossibility of Distributed Consensus with One Faulty Process
  2. Lamport. Paxos Made Simple
Optional: Lamport's original paper, Paxos made practical, viewstamped replication
Comments/feedback on proposal draft returned.
A1:Keheliya
S1:Sam

A2:Brittany
S2:Neil
Feb 26
Thu
Paxos in practice
  1. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems
Assignment 4 due by 9PM.
Optional: ZooKeeper, and Paxos for data storage
A1:Michael
S1:Brittany
Mar 3
Tue
Byzantine Fault Tolerance
  1. Lamport, Shostak, and Pease. The Byzantine Generals Problem
Project proposals due by 9PM via email.
A1:Neil
S1:Scott
Mar 5
Thu
Byzantine Fault Tolerance
  1. Castro and Liskov. Practical Byzantine Fault Tolerance
A1:Sam
S1:Wali
Mar 10
Tue
Weakly connected systems
  1. Kistler and Satyanarayanan. Disconnected Operation in the Coda File System
  2. Terry et al. Managing update conflicts in Bayou, a weakly connected replicated storage system
A1:Arash
S1:Brittany

A2:Jianing
S2:Peter
Mar 12
Thu
Weak consistency
  1. Demers et al. Epidemic algorithms for replicated database maintenance
A1:Kent
S1:Jianing
Mar 17
Tue
CAP theorem
  1. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed
  2. Sirer. Consistency Alphabet Soup
Each team must send email by 9PM to schedule a private meeting with instructor to discuss project status.
A1:Sampoorna
S1:Joyita

A2:Jonathan
S2:Nabil
Mar 19
Thu
CAP practice
  1. Corbett et al. Spanner: Google's Globally-Distributed Database
A1:Nabil
S1:Michael
Mar 24
Tue
Distributed Hash Tables
  1. DeCandia et al. Dynamo: Amazon's Highly Available Key-Value Store
A1:Keheliya
S1:Arash
Mar 26
Thu
DHTs and storage
  1. Dabek. Wide-area cooperative storage with CFS
A1:Arash
S1:Giovanni
Mar 31
Tue
P2P: BitTorrent
  1. Cohen. Incentives build robustness in BitTorrent
  2. Piatek et al. Do incentives build robustness in BitTorrent?
A1:Scott
S1:Ravjot

A2:Jianing
S2:Shuochen
Apr 2
Thu
P2P: BitCoin
  1. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System
  2. Nielsen. How the Bitcoin protocol actually works (2-column pdf version)
Write one review for readings #1 and #2.
A1/2:Scott
S1/2:Keheliya
Apr 7
Tue
No class
Apr 9
Thu
No class
Apr 14
Tue
Project presentations (2-3:30 PM, Room TBD).
Apr 16
Thu
Project presentations continued (2-3:30 PM, Room TBD).
Apr 23
Thu
Project code and reports due by 9PM via email.

Paper reviews

For each of the assigned readings in the schedule above you must compose a review/critique. (The schedule sometimes lists optional readings; you do not need to review these.) Reviews must be posted to HotCRP by 9PM the day before the class in which they are discussed.

Reviews should engage with the reading at a deep level. Write you review with an eye towards contributing something interesting to the discussion in class the following day. See paper reviews advice for more information.

Reviews are graded using the following scale:

  • 3 : the review engages with the reading in great depth
  • 2 : the review engages with the reading in some depth
  • 1 : the review is shallow, for example it is a summary of the reading
  • 0 : no review was posted, or the review is unreadable, off topic, or offers no insight into the reading and/or is a simple re-statement of some text in the reading (e.g., the abstract).

Late reviews will lose 1 point for every 12 hour period that the review is late.

Assignments

There are four assignments. All assignments must be completed in Go (I will use Go 1.4 to compile your code). Each student must work independently and submit their own solution source code (see collaboration guidelines at the bottom of this page).

Solution must be submitted to the instructor as an email attachment by 9PM of the day of the deadline. For this email use the subject line template "538B-2014w2 Assignment X: Name", where X is the assignment number and Name is the student's last name. Special instructions for compiling/running the code should be included as a README file or in the body of the email. The attachment can be a .go file or an archive (e.g., .zip or .tgz).

Assignments will be marked based on functionality: coding style and solution approach will not be marked. Full marks will be given to assignments that behave according to the given specifications. Partial marks will be given to assignments that only partially fulfill the specifications.

Assignment deadlines are listed in the schedule above and below. Assignment descriptions will be linked to from this page once they are available.

A late assignments will lose 20% of its mark for every 12 hour period that the assignment is late.

Project

The project must address a non-trivial problem relevant to distributed systems. The project must include a substantial software effort in Go and must be done in a team of 2 or 3 students. Here are some projects ideas (do not limit yourself to these!):

  • Build a peer-to-peer DropBox clone.
  • Build a record-replay tool for distributed Go programs.
  • Build a distributed state assertions library for Go.
  • Build an anonymity system based on onion routing.
  • Implement a robust prototype of an interesting routing, replication, fault tolerance, leader election or other distributed algorithm from our readings, and thoroughly evaluate it.

There are four project deliverables:

  • Project proposal: a paper detailing the problem, your proposed approach/solution, and a realistic timeline for your team. See proposal advice for more information.
  • Prototype implementation: must be done in Go and must involve substantial development effort. The final prototype must be shared with the instructor (at the end of the term), preferably as an hg or git repository.
  • Project presentation: a 10-15 minute presentation describing your project (10 minutes for 2-person teams and 15 minutes for 3-person teams). Plan the presentation so that each team member talks for 5 minutes. The presentation will be followed by a 5 minute Q/A period. I encourage you to demo your project during the presentation.
  • Project report: a paper detailing the problem, your approach/solution, your prototype, and an evaluation of the prototype. The final report (due at the end of the term) should be no longer than 10 pages and should resemble a research paper.

The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted by email to the instructor by 9PM of the day of the deadline.

  • Feb 10 : Project proposal drafts
  • Mar 03 : Project proposals
  • Mar 17 : Each team must send email by 9PM to schedule a private meeting with instructor to discuss project status.
  • Apr 23 : Project code and reports

Grading

Final course mark will be based off of:

  • Class participation: 20%
  • Paper reviews: 30%
  • Assignments: 15%
    • A1: 2%
    • A2: 4%
    • A3: 4%
    • A4: 5%
  • Project: 35%
    • Proposal: 5% (team)
    • Presentation: 2.5% (team) + 2.5% (individual)
    • Final report: 25% (team)

Note that the project must be a team effort. The team's mark for the proposal and final report is the same for all team members. For project presentations each team member will receive a team mark and an individual mark.

The mark for class participation is based on three factors:

  • Regular course attendance
  • Regular participation in the in-class paper discussions
  • Leading of discussion for advocate/skeptic assignments

How to do well in this course

Be prepared to participate in in-class discussion. This is a seminar-style course, which means that most of the class time will be devoted to discussion. The best way to prepare for class is to read the assigned paper(s), write a thoughtful review, and then read and carefully consider the reviews submitted by your peers. Periodically re-read the readings from the first day of class and work to improve your paper reading and reviewing abilities.

Invest the time into learning Go. As with any new programming language, your proficiency in the language will highly correlate with the amount of code you write: practice counts for a lot!

Plan you reading time. The readings will likely challenge you. I recommend allocating an explicit time slot each week for reading the papers and for thinking about the papers. Note that some readings will be more difficult than others. Jump ahead and note the readings that are particularly long, theoretical, or may be especially challenging to you.

Invest time into the project. Do not underestimate the importance of a thorough (and interesting!) project proposal. Proposal write-ups that are vague or are incomplete will not be accepted. Put in consistent and weekly effort into the project. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Be proactive. There are no explicit office hours for this course. Email and schedule a time to meet with the instructor to discuss the course, assignments, etc.

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Paper reviews. Paper reviews must be written individually. You are free to discuss the readings with other students, but write your reviews on your own. Cite and attribute points from discussions with other students or external sources that you have read in your review.

Assignments. You can talk with other students about the assignments. You cannot look at each other's code and you cannot share code. You can use Google and other resources to find code examples, but you should not rely on these directly by copy and pasting: you must implement your own solution from scratch. You can use standard Go libraries (as long as they don't directly solve the assignment for you); you cannot use any external libraries that are not distributed with Go.

If you do discuss the assignment with someone or use external resources (e.g., a StackOverflow question) then you must cite and attribute your sources in a README distributed with your assignment (or in body of email with your solution). Stating the source is insufficient: you should explain what was discussed/found and how you have used this information in your assignment solution.

Projects. The policy for project is more liberal than for assignments: you are free to use any code you find in your project (e.g., you can use external libraries that are not distributed with Go). However, a non-trivial fraction of functionality in your prototype must be constructed by your team. As with assignments, cite and attribute sources of the code that you borrow/utilize in your project.