Distributed Systems

Distributed Systems

CPSC 416, Winter 2016

Mon/Wed/Fri 2-3:00PM, HDP 301, UBC course page

Course piazza

Office hours:
Ivan....... Mon 3:30-4:30 (ICICS 327)
Lynsey.. Thu 1:00-2:00 (ICCSX 139)
Patrick... Fri 3:00-4:00 (ICCSX 141)



This course has completed.
You may be looking for CPSC 416 2016 W2


Course description

Leslie Lamport, a computer scientist who won the 2013 ACM Turing Award, gave the following definition of a distributed system:

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

Yet, distribution provides numerous benefits. A system becomes more fault tolerant if there are fewer points of failure and it has no centralized components. By extending the system with more physical nodes the system gains performance and becomes more scalable, capable of handling more load. Distribution can also improve latency, by improving geographic diversity, by placing resources closer to clients who use the system.

Achieving these benefits is not easy. As the quote above illustrates, distributed systems can fail in complex ways and these systems are more difficult to build, test, and understand than centralized systems.

This course will introduce you to a broad range of topics in distributed systems. The tentative topics are listed in the schedule below. For the most part this will be a lecture-style course. However, distributed system concepts are notoriously challenging to internalize without first-hand experience. The emphasis of this course, therefore, will be on building distributed system prototypes, small and large. In the first half of the term you will do this through smaller assignments. In the second half of the term the assignment will be replaced with a (mostly) open-ended project.

Course pre-requisites: CPSC 317 (networks) and CPSC 313 (computer hardware and operating systems).

Course staff: Ivan Beschastnikh (Instructor), Patrick Colp (TA), Lynsey Haynes (TA).

Go programming language

In this course we will exclusively use the Go programming language for all assignments and the project. Learning a new programming language is an important skill. You will practice it in this course. We will spend some time at the beginning of the course covering the basics of Go. However, I will expect that you learn this language mostly on your own.

Textbooks

There are three optional books for this course:

  1. Go Programming Language
  2. Programming in Go
  3. Principles of Computer System Design (we will use Part II, which is online)
Although there are many tutorials introducing Go and the online Go documentation is well developed, some of you may find the first two books on the list helpful for a step-by-step introduction to Go. We will only use Part II from the third book on the list and all of the contents for this part are available online.

Communication

Use the course Piazza for all course-related communication. The Piazza also supports private posts that you can use to communicate with the instructor and the TAs. Use email for private issues related to the course.

Course-level learning goals

The course will provide an opportunity for participants to

  • understand key principles in designing and implementing distributed systems
  • reason about problems that involve distributed components
  • become familiar with important techniques for solving problems that arise in distributed contexts
  • build distributed system prototypes using the Go programming language

Schedule (a work in progress)

Jan 4
Mon
Introduction and course overview [slides]

Read through Go resources prior to class, and practice as much Go as you can.
Jan 6
Wed
Jan 8
Fri
Jan 11
Mon
Networks review 1/2 [slides]
Jan 13
Wed
Networks review 2/2 and start of RPC [slides]
Assignment 1 due by 9PM.
Jan 15
Fri
RPC continued [slides]
Go RPC client and server code examples.
Jan 18
Mon
Distributed file systems [NFS and AFS overview] [slides]

Deadline to withdraw without a W standing

Jan 20
Wed
Distributed file systems [NFS and AFS designs] [slides]
Jan 22
Fri
Distributed file systems [CODA, disconnected operation] [slides]
Assignment 2 due by 9PM.
Jan 25
Mon
Time Synchronization [Berkeley and Cristian's algs, NTP] [slides]
Jan 27
Wed
Logical time [Lamport and vector clocks] [slides]
Jan 29
Fri
Distributed mutual exclusion [slides]
Feb 1
Mon
Fault Tolerance, local faults [slides]
Assignment 3 due by 9PM.
Feb 3
Wed
Fault Tolerance, local faults (continued) [slides]
Feb 5
Fri
RAID [slides]
Feb 8
Mon
No class (Family Day)
Ivan's office hours moved to Tue, Feb 9th, 2-3PM.
Feb 10
Wed
Replication; primary-backup [slides]
Feb 12
Fri
Quorum replication; Paxos protocol [slides]

Assignment 4 due by 9PM.
Project team roster due.
Feb 15
Mon
No class (UBC reading break); no office hours
Feb 17
Wed
No class (UBC reading break); no office hours
Feb 19
Fri
No class (UBC reading break); no office hours
Feb 22
Mon
Assignment 3 solution discussion, project/SWOT analysis discussion, GoVector explanation/demo.
Project proposal drafts due (optional).
Feb 24
Wed
DNS design [slides]
Feb 26
Fri
Content Distribution Networks (CDNs) [slides]
Feb 29
Mon
Peer-to-Peer (P2P) [slides]
Project proposals due by 9PM.
Mar 2
Wed
Peer-to-Peer (P2P), part 2 [slides]
Mar 4
Fri
Talk by Mihir Nanavati
Consistency in Distributed Storage Systems [slides]
Mar 7
Mon
Transactions, part 1: ACID semantics and 2-phase locking [slides]
Mar 9
Wed
Transactions, part 2: logging [slides]
Mar 11
Fri
Industry talk by Jodi Spacek
Hootsuite micro services at scale. [slides]
Mar 14
Mon
Transactions, part 3: two phase commit [slides]

Readings:
  • Chapter 7 (Distributed Recovery) in Concurrency Control and Recovery in Database Systems by Bernstein et al. (covers all 2PC and 3PC material)
  • Mar 16
    Wed
    Transactions, part 4: two phase commit in other topologies [slides]
    Mar 18
    Fri
    Transactions, part 5: three phase commit [slides]

    Teams must email TAs to schedule a project status meeting.
    Mar 21
    Mon
    Paxos, part 1 [slides]
    Mar 23
    Wed
    Paxos, part 2 [same slides]
    Mar 25
    Fri
    No class (Good Friday)
    Mar 28
    Mon
    No class (Easter Monday)
    Mar 30
    Wed
    Practice questions 26 and 27, and Paxos review [PQs]
    Apr 1
    Fri
    Assignment 4 solution review
    Apr 4
    Mon
    Practice questions 28 and 29 review [PQs]
    Apr 6
    Wed
    Distributed system design 100ft level [slides]
    Apr 8
    Fri
    Distributed system design 100ft level, continued [same slides]
    Last day of class.
    Apr 11
    Mon
    Project code and final reports due by 9PM.
    Apr 14
    Thu
    Final exam at 12:00 PM in IBLC 182.

    Go resources

    Go is a systems language designed at Google. It is especially well suited to building distributed systems. Like with any language, the fastest way to become proficient at Go is to put in the time writing programs in Go. Here are some resources to get you started:

    We will be using Go version 1.4.3 (the latest version that support was able to get working on ugrad machines). A couple of bits to keep in mind:
    • Go 1.4.3 is not the latest version. You can download this specific version, but most package managers will install the incorrect latest version. Be careful, and check that you are running the right version with go version.
    • The online Go documentation is for the latest version of Go, so avoid it if possible. As an alternative, you can:
    • Ugrad machines have Go installed in /cs/local/lib/pkg/go-1.4.3/; add /cs/local/lib/pkg/go-1.4.3/bin to your PATH.

    Assignments

    There are four assignments. All assignments must be completed in Go and work with version 1.4.3 (which is installed on ugrad machines). You must work independently on assignments 1 and 2 and each student must submit their own solution source code. For assignments 3 and 4 you will work in teams of two. See collaboration guidelines at the bottom of this page for more information.

    Solution must be submitted using the stash server by 9PM of the day of the deadline. Special instructions for compiling/running the code should be included as a README.txt file.

    For assignments 3 and 4 you will need to find a partner and sign up for a stash repository here.

    To access the hand-in git repository for assignment X as student with undergrad userid UID, run the following command:

    git clone https://stash.ugrad.cs.ubc.ca:8443/git/CS416_2015W2_NONE/assignmentX_UID.git

    Add your solution (and don't forget to push!) to the repository by the deadline.

    Assignments will be primarily marked based on functionality. Partial marks will be given to assignments that partially fulfill the specifications. It is in your best interest to properly comment and document your code to receive appropriate partial credit. All solutions must be formatted using gofmt (20% penalty to the mark for those that are not properly formatted).

    Assignment deadlines are listed in the schedule above and below. Assignment descriptions will be linked to from this page once they are available.

    • Jan 13 : Assignment 1 (Secure fortune client; individual)
    • Jan 22 : Assignment 2 (Secure fortune server; individual)
    • Feb 01 : Assignment 3 (Coordination via a faulty KV-service; in pairs)
    • Feb 12 : Assignment 4 (Replicated KV-store; in pairs)

    Project

    The project must address a non-trivial problem relevant to distributed systems. The project must include a substantial software effort in Go and must be done in a team of 4 students. Note that 'substantial' includes complexity and not just code size. The most direct way to satisfy the project requirement is to prototype a distributed system. Such a system can be built from scratch, but the project can also be formulated as a non-trivial extension to an existing system. The idea behind the system does not need to be original, but the majority of the distributed logic in the implemented system must be implemented by the project team.

    A list of posted project ideas (evolving).

    Sign up for a project stash repository here.

    Project constraints (evolving):

    • Go must be used for the core distributed logic in the system. However, other languages may also be used in the project.
    • The system must be well tested.
    • The system must provide diagnostic information for all distributed logic in the form of vector timestamped console logs. The GoVector library can be used for this. The resulting logs must be compatible with the ShiViz tool.

    Deliverables

    • Project proposal: a paper detailing the problem, your proposed approach/solution, a realistic timeline for your team, and a SWOT analysis for your team.
      • Use your group's stash project repository to submit your proposal. Place your proposal into proposal/proposal.pdf at the top level of your repository (if you use LaTex, make sure that it is compiled into a pdf).
      • To submit a project proposal draft, do the above step and also send Ivan an email with the group's repository name.
    • Prototype implementation: must involve substantial development effort. The prototype git repository must be shared with the course staff.
      • We expect your repository to include a detailed README file that explains how to compile/run your implementation.
    • Project report: a paper detailing the problem, your approach/solution, design of your prototype, and an evaluation of the prototype.
      • Project code and final reports are due on April 11th by 9PM.
      • Use your group's stash project repository to submit your report. Place your report into report/report.pdf at the top level of your repository (if you use LaTex, make sure that it is compiled into a pdf).
      • The usual late policy applies (20% late penalty to mark; deliverables will not be accepted if late by more than 24 hours).
    • Project demo: a 15-minute private demo of your project to the instructor/group TA, including a technical Q/A regarding the project design and implementation.
      • The stash project repositories will not be frozen after you submit your code and report. So, you can continue to use your repository to develop and improve your demo.

    Deadlines

    The project is structured as a series of regularly occurring deadlines, listed in the schedule above and below. Do not miss these! The deadline deliverable must be submitted through stash by 9PM of the day of the deadline.

    • Feb 22 : Project proposal drafts (not marked, for feedback only)
    • Feb 29 : Project proposals
    • Mar 18 : Each team must send email to an assigned TA to schedule a meeting to discuss project status.
    • Apr 11 : Project code and final reports
    • Apr 12-22 : Project demos (15 minutes/group)

    Exam

    To practice for the exam we will go over 1-3 questions at the start of each class. You can also download the complete set of practice questions we have covered thus far (updated continuously).

    Final exam will be on April 14th, at 12:00 PM, in IBLC 182.

    Grading

    Final course mark will be based off of:

    • Assignments: 30%
      • A1: 5%
      • A2: 5%
      • A3: 10% (team)
      • A4: 10% (team)
    • Project: 50%
      • Proposal: 15% (team)
      • Final report and code: 20% (team)
      • Demo: 15% (team)
    • Final exam: 20%

    Note that A3, A4, and the project must be team efforts. The team's mark for these deliverables is the same for all team members.

    Late policy

    The deadline for any assignment can be extended by one day with a 20% penalty to the mark. Assignments will not be accepted 24 hours past the original deadline.

    Deadlines for project deliverables are strict (more fault tolerance in a group of four).

    If you have an emergency (e.g., health) that prevents you from meeting a deadline. You must notify the instructor before the deadline.

    How to do well in this course

    Learn Go early and practice it regularly. Learning a new language while being time constrained is stressful and not fun. Since the assignments rapidly increase in their difficulty, it will be to your advantage to learn Go as quickly as possible and to learn it well. The posted Go resources are a great starting point, but reading is no substitute for practice, bug, debug, practice, practice, bug, coffee, debug, practice, ...

    Do not skimp on software engineering. Distributed systems are hard. They are hard to understand, to build, to debug, to run, to trace, to document, etc. Do not make your life any more difficult. Use best practices from software engineering to help you in this course. Write unit and integration tests, use version control, document your code with comments, write small prototypes, refactor your code, make your code readable and easy to run and debug. If you fail to follow best practices, they will come back to bite you later on. Unfortunately, this course will not explicitly teach you these best practices, but you probably took a course that introduced you to these concepts. If you have any questions, just ask us on Piazza.

    Choose your teammates, wisely. Project success depends critically on your ability to work effectively within a team of four. Assignments 3 and 4 require a team of two: this is your chance to meet some folks and see if you would want them on your project team. You are responsible for resolving personal and technical differences among teammates on your own. Let us know as early as possible if you have team concerns, before they turn into crises.

    Invest time into the project. The project forms the majority of the mark for the course. That is, it is important to me that you do a good job on it. However, you can only do a good job on the project if you allocate the time (particularly as you will need to learn and practice Go along the way!). Schedule your work; put in consistent and weekly effort into the project.

    Key project deliverables are write-ups; do these extra well. The proposal write-up alone is 15% of your final mark! The proposal and the final report must clearly convey the high-level ideas, be technically thorough, and must be well-written. Quality technical writing takes time and care. Use well-established methods to improve your writing: draft increasingly detailed outlines, get feedback from your peers/course staff on early ideas and drafts, compose descriptive infographics/diagrams, use the spellchecker, etc. Proposal write-ups that are vague or are incomplete will not be accepted (you will have to redo the proposal, but with much less time).

    Reach out for success. This is intended to be a challenging fourth year course, but that does not mean that you have to work through it on your own! The course piazza should be your first stop for all technical questions. The course has specific office hours (see top of page), but I and the TAs are flexible. Send any of us an email to schedule a time to discuss the course, the assignments, the project, etc. University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with us or an academic advisor as early as possible. For help in addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit this link.

    Academic honesty and collaboration guidelines

    The department has a detailed policy regarding collaboration and plagiarism. You must familiarize yourself with this policy.

    Assignment 1: You must work by yourself. No code sharing is allowed. You can use any code examples that you find on the internet for help, but use these as a starting point and write your own code. If you have consulted a specific resource extensively, then note this in the README file (better be safe than sorry).

    Assignment 2: Same as assignment 1.

    Assignment 3: You must work in a team of two. Your team will receive a single mark for the assignment. No code sharing between teams is allowed. You can use any code examples that you find on the internet for help, but use these as a starting point and write your own code. If you have consulted a specific resource extensively, then note this in the README file (better be safe than sorry).

    Assignment 4: Same as assignment 3, except that you cannot use the same team.

    Acknowledgments

    Many of the materials used in this course are derived from CMU's 15-440: Distributed Systems course from Spring 2014, and are used with permission from the content authors.