Distributed Systems

Distributed Systems

CPSC 416, Winter 2017

Mon/Wed/Fri 3-4:00PM, HDP 110, UBC course page

Course piazza

Office hours:
Stewart..... Mon 2:00-3:00 (X339)
Ivan.......... Tue 3:00-4:00 (ICICS 327)
Jodi.......... Thu 1:00-2:00 (X153)
Amanda.... Fri 2:00-3:00 (Demco Table 1)


Course description

Leslie Lamport, a computer scientist who won the 2013 ACM Turing Award, gave the following definition of a distributed system:

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

Yet, distribution provides numerous benefits. A system becomes more fault tolerant if there are fewer points of failure and it has no centralized components. By extending the system with more physical nodes the system gains performance and becomes more scalable, capable of handling more load. Distribution can also improve latency, by improving geographic diversity, by placing resources closer to clients who use the system.

Achieving these benefits is not easy. As the quote above illustrates, distributed systems can fail in complex ways and these systems are more difficult to build, test, and understand than centralized systems.

This course will introduce you to a broad range of topics in distributed systems. The tentative topics are listed in the schedule below. For the most part this will be a lecture-style course. However, distributed system concepts are notoriously challenging to internalize without first-hand experience. The emphasis of this course, therefore, will be on building distributed system prototypes, small and large.

Course pre-requisites: CPSC 317 (networks) and CPSC 313 (computer hardware and operating systems).

Course staff: Ivan Beschastnikh (Instructor), Amanda Carbonari (TA), Stewart Grant (TA), Rohin Patel (TA), Jodi Spacek (TA).

Go programming language

In this course we will exclusively use the Go programming language for all assignments. Learning a new programming language is an important skill. You will practice it in this course. We will spend some time on January 13th covering the basics of Go. However, I will expect that you learn this language mostly on your own.

Amanda and Stewart led an in-class Go tutorial. Here is the recorded version: part 1, and part 2.

Textbooks

There are three optional books for this course:

  1. Go Programming Language
  2. Programming in Go
  3. Distributed Systems: Principles and Paradigms (2nd Edition)
Although there are many tutorials introducing Go and the online Go documentation is well developed, some of you may find the first two books on the list helpful for a step-by-step introduction to Go.

Communication

Use the course Piazza for all course-related communication. The Piazza also supports private posts that you can use to communicate with the instructor and the TAs.

Course-level learning goals

The course will provide an opportunity for participants to

  • understand key principles in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • become familiar with important techniques for solving problems that arise in distributed contexts
  • build distributed system prototypes using the Go programming language

Schedule (a work in progress)

Jan 4
Wed
Introduction and course overview [slides]

Read through Go resources prior to class, and practice as much Go as you can.

Jan 6
Fri
Networks review 1/2 [slides]
Jan 9
Mon
Networks review 2/2 and start of RPC [slides]
Jan 11
Fri
RPC continued [slides]

Go RPC client and server code examples.

Jan 13
Fri
In-class Go tutorial (led by TAs)

Recorded tutorial part 1, and part 2.

Assignment 1 due by 9PM.

Jan 16
Mon
Distributed file systems [NFS and AFS overview] [slides]
Jan 18
Wed
Distributed file systems [NFS and AFS designs] [slides]
Jan 20
Fri
Distributed file systems [NFS and AFS designs, contd.] [slides]
Jan 23
Mon
Distributed file systems [CODA, disconnected operation] [slides]
Jan 25
Wed
Time Synchronization [Berkeley and Cristian's algs, NTP] [slides]

Assignment 2 due by 9PM.

Jan 27
Fri
A2 solution review and going over A3
Jan 30
Mon
Logical time [Lamport and vector clocks], [slides]
Feb 1
Wed
Distributed mutual exclusion [slides]
    Readings:
  • TBD
Feb 3
Fri
Fault Tolerance, local faults [slides]
Feb 6
Mon
Fault Tolerance, local faults (continued) [slides]
Feb 8
Wed
RAID [slides]

Assignment 3 due by 9PM.

Feb 10
Fri
RAID contd. and primary-backup [slides]
Feb 13
Mon
No class (Family Day)
Feb 15
Wed
Content Distribution Networks (CDNs) [slides]

Assignment 4 released.

Feb 17
Fri
Review assignment 3 solution
Feb 20
Mon
No class (UBC reading break); no office hours
Feb 22
Wed
No class (UBC reading break); no office hours
Feb 24
Fri
No class (UBC reading break); no office hours
Feb 27
Mon
Industry talk by J. Peter MacKay

Assignment 4 due by 9PM.

Mar 1
Wed
Peer-to-Peer (P2P) [slides]

Assignment 5 released.

Mar 3
Fri
Peer-to-Peer (P2P), part 2 [slides]
Mar 6
Mon
Peer-to-Peer (P2P), part 3, Chord DHT
Mar 8
Wed
Transactions, part 1: ACID semantics and 2-phase locking [slides]
Mar 10
Fri
PQ 29 designs discussion

Assignment 5 due by 9PM.

Mar 13
Mon
Assignment 6 overview

Assignment 6 released.

Mar 15
Wed
Transactions continued, part 2: logging [slides]
Mar 17
Fri
Transactions continued, part 3: write-ahead logging and two-phase commit [slides]
Mar 20
Mon
2PC in other topologies [slides]
Mar 22
Wed
Three phase commit (3PC) [slides1, slides2]
Mar 24
Fri
Distributed P2P ledger: BitCoin [notes]

Assignment 6 due at midnight.

Mar 27
Mon
A6 solution discussion and A7 overview

Assignment 7 released.

Mar 29
Wed
Industry talk by Stuart Ritchie (Arista Networks)
Dealing with message backlogs

A common problem is producers producing faster than consumers consume. How does the system buffer unread messages? Surely there's finite memory. Can we cap a limit on the size of the backlog? How do we deal with a producer that hits the limit, or a consumer that is too slow? How can we design systems in which the producer and consumer can run reliably at different speeds?

Mar 31
Fri
A7 design discussion, TA evaluations
Apr 3
Mon
Quorum replication; Paxos protocol [slides]
Apr 5
Wed
Paxos final thoughts; Distributed systems design considerations [slides]

Last lecture

Apr 14
Fri

Assignment 7 due at midnight.

Apr 19
Wed

Final exam 12PM, location SRC B

Go resources

Go is a systems language designed at Google. It is especially well suited to building distributed systems. Like with any language, the fastest way to become proficient at Go is to put in the time writing programs in Go. Here are some resources to get you started:

We will be using Go version TBD (the latest version that support was able to get working on ugrad machines).

Assignments

There are seven assignments. All assignments must be completed in Go. You must work independently on assignments 1-5 and each student must submit their own solution source code. For assignments 6 and 7 you will work in teams of two. See collaboration guidelines at the bottom of this page for more information.

Solution must be submitted using the stash server by 9PM of the day of the deadline (except for A6/A7, which are due at midnight). Special instructions for compiling/running the code should be included as a README.txt file.

Assignments will be primarily marked based on functionality. Partial marks will be given to assignments that partially fulfill the specifications. It is in your best interest to properly comment and document your code to receive appropriate partial credit. All solutions must be formatted using gofmt.

To access the hand-in git repository for assignment X as student with undergrad userid UID, run the following command:

git clone https://stash.ugrad.cs.ubc.ca:8443/git/CS416_2016W2_/assignmentX_UID.git

Add your solution (and don't forget to push!) to the repository by the deadline.

Assignment deadlines are listed in the schedule above and below. Assignment descriptions will be linked to from this page once they are available.

  • Jan 13 : Assignment 1 (Goldilocks fortune; individual)
  • Jan 25 : Assignment 2 (Resource distribution; individual)
  • Feb 8 : Assignment 3 (Resource distribution with peer churn; individual)
  • Feb 27 : Assignment 4 (Distributed measurement on Azure; individual)
  • Mar 10 : Assignment 5 (Distributed web crawler on Azure; individual)
  • Mar 24 : Assignment 6 (Transactional k-v store on Azure; in pairs)
  • Apr 14 : Assignment 7 (A block-chain-based k-v store on Azure; in pairs)

Exam

To help you practice for the final exam we will typically go over 1-2 questions at the start of each class. You can download the set of practice questions we have covered so far (updated continuously).

Final exam: April 19, 12:00 PM, Location SRC B

Grading

Final course mark will be based off of:

  • Assignments: 70%
    • A1: 5%
    • A2: 10%
    • A3: 10%
    • A4: 5%
    • A5: 10%
    • A6: 15% (team)
    • A7: 15% (team)
  • Final exam: 30%

Note that A6 and A7 must be done in teams of two people. The team's mark for these deliverables is the same for both team members.

Late policy

The deadline for any assignment can be extended by one day with a 20% penalty to the mark. Assignments will not be accepted 24 hours past the original deadline.

If you have an emergency (e.g., health) that prevents you from meeting a deadline. You must notify the instructor before the deadline.

How to do well in this course

Learn Go early and practice it regularly. Learning a new language while being time constrained is stressful and not fun. Since the assignments rapidly increase in their difficulty, it will be to your advantage to learn Go as quickly as possible and to learn it well. The posted Go resources are a great starting point, but reading is no substitute for practice, bug, debug, practice, practice, bug, coffee, debug, practice, ...

Do not skimp on software engineering. Distributed systems are hard. They are hard to understand, to build, to debug, to run, to trace, to document, etc. Do not make your life any more difficult. Use best practices from software engineering to help you in this course. Write unit and integration tests, use version control, document your code with comments, write small prototypes, refactor your code, make your code readable and easy to run and debug. If you fail to follow best practices, they will come back to bite you later on. Unfortunately, this course will not explicitly teach you these best practices, but you probably took a course that introduced you to these concepts. If you have any questions, just ask us on Piazza.

Choose your teammates, wisely. The last two assignments (30% of your mark) depend critically on your ability to work effectively with one other student. You are responsible for resolving personal and technical differences among teammates on your own. Let us know as early as possible if you have team concerns, before they turn into crises.

Reach out for success. This is intended to be a challenging fourth year course, but that does not mean that you have to work through it on your own! The course piazza should be your first stop for all technical questions. The course has specific office hours (see top of page), but I and the TAs are flexible. Send any of us an email to schedule a time to discuss the course, the assignments, etc. University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with us or an academic advisor as early as possible. For help in addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit this link.

Academic honesty and collaboration guidelines

The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

Assignments 1-5: You must work by yourself. No code sharing is allowed. You can use any code examples that you find on the internet for help, but use these as a starting point and write your own code. If you have consulted a specific resource extensively, then note this in the README file (better be safe than sorry).

Assignments 6: You must work in a team of two. Your team will receive a single mark for the assignment. No code sharing between teams is allowed. You can use any code examples that you find on the internet for help, but use these as a starting point and write your own code. If you have consulted a specific resource extensively, then note this in the README file (better be safe than sorry).

Assignment 7: Same as assignment 6, except that you cannot use the same team.

Acknowledgments

Many of the materials used in this course are derived from CMU's 15-440: Distributed Systems course from Spring 2014, and are used with permission from the content authors.