Project 2

416 Distributed Systems: Project 2 [Open ended]

Report due Apr 6th at 11:59PM

Winter 2018

Project 2 is an open-ended project that must be done in a team of 3-5 people and must be (at least partially) deployed on Azure. For this project you can use the same team from project 1, or you can form a new team. I encourage you to form teams of 4 people: this will allow us to grant you more time for your demos, and will provide you with sufficient developer power to execute on an ambitious project.

Note that two key project deliverables are write-ups (proposal/report). The proposal write-up alone is 10% of your final mark! The proposal and the final report must clearly convey the high-level ideas, be technically thorough, and must be well-written. Quality technical writing takes time and care. Use well-established methods to improve your writing: draft increasingly detailed outlines, get feedback from your peers/TAs on early ideas and drafts, compose descriptive infographics/diagrams, use the spellchecker, etc. Proposal write-ups that are vague, incomplete, or incoherent will receive a poor mark (you will also probably have to redo your proposal, but with much less time).

Type of project

Your project must address a non-trivial problem related to distributed systems. It must include a substantial software effort in Go. Note that 'substantial' includes complexity and not just code size. The most direct way to satisfy the project requirement is to prototype a distributed system. Such a system can be built from scratch, but the project can also be formulated as a non-trivial extension to an existing system. The idea behind the system does not need to be original, but the majority of the distributed logic in the implemented system must be implemented by the project team.

As a benchmark, your project must have about the same complexity/difficulty as project 1.

Project constraints (evolving):

Go must be used for the core distributed logic in the system. However, other languages may also be used in the project. For example, you can build a distributed system in Go and have Android clients, implemented in Java, that connect to it and use it.
The system must be able to support node churn: nodes that fail and leave the system, as well as nodes that join the system.
The system cannot be embarrassingly parallel: there must be some distributed state and coordination between nodes in your system.
At least some part of the system must be deployed on Azure.
The system must be well tested.

Project ideas

Here are several project ideas. Treat these as inspiration; I strongly encourage you to come up with your own project idea.

Project idea: Build an anonymity network

Tor is an anonymity system built on onion routing. Tor allows clients to obfuscate their network identity/location (IP address). The idea is simple, but supporting multiple clients, defending against attacks, and providing good performance to clients (e.g., responsive browsing) are non-trivial requirements.

One version of this project is to prototype a basic version of Tor, deploying it on Azure, and demonstrating that you can use it to browser the internet. A basic version might include:

Handling connecting/disconnecting guard/relay/exit nodes
Secure onion routing (intermediate hops do not observe payload)
Circuit setup/tear-down protocols
Periodic circuit refresh to avoid using a circuit for too long

Tor is just one type of anonymity system. If you are interested in this space, there are a variety of other system designs that you can adopt. Or, feel free to create a new one!

Project idea: Build a peer-to-peer machine learning system

Machine learning is all the rage. There are many distributed frameworks, but all of them assume a centralized learning process with access to a central store of training data. Build a peer-to-peer solution for learning a global model (of a variety of your choice) that has as few centralized components as possible and where data is spread across peers. Assume an adversarial context in which peers do not want to reveal their data to others. For this project you may want to recruit to your team someone who has taken CPSC 340 (and has done well in it). You can also substantially expand the security/privacy requirements of this project.

Project idea: Build a distributed web crawler/search engine

Web crawling is kind of a 90s topic. But, an efficient and scalable version is a complex distributed system with many interesting pieces. Last year's assignment 5 describes an 'assignment' version of a web crawler that is a good starting point. This version described a set of worker crawlers that are spread over multiple data-centers, a web-graph that is maintained in a distributed fashion, a distributed page rank computation, and keyword search capability. You could extend this version or consider building a different variant.

Some other project ideas

Build a distributed object system, like Emerald but without a compiler.
Build a distributed shared memory system, like Treadmarks
Build a distributed assertions mechanism that can be used for runtime checking of distributed systems.
Implement a byzantine fault tolerance algorithm, an example is PBFT

Proposal

A project proposal is a paper that details the problem, your proposed approach/solution to the problem, a realistic timeline for your team's actions to create the solution, and a SWOT analysis for your team/project.

You should aim for a proposal that is about 5 pages long. Shorter and you're probably missing some detail; longer and it becomes too detailed and too long to read. That said, there are no page limits (lower bounds nor upper bounds) on your proposal.

Here are three example proposals from the last time I taught this course with an open-ended project:

Here are two high-level ways in which I think about your proposal:

A proposal is a contract. If you build the thing described in the proposal then you get a perfect mark on the project. But, writing good contracts is hard work. For example, a good contract must be precise (it should be clear what you are and are not going to do).
A proposal is your opportunity to convince me that you know what you're getting yourself into. I won't let you do a project if I know that you do not stand a reasonable chance of succeeding at it (this is a distributed system course, not an SE course :-) So, the proposal should convince me that you know what you're doing -- that you've thought about the key issues (you know what they are, approximately how you're going to solve them), you know what resources you will need/where you will get them (technology/libraries/algorithms/data sources/hardware/etc), that you thought about how to manage your time and how to manage the team roles and responsibilities (who does what/when), and that it all adds up to a realistic plan for a successful project.

You may also find the following proposal advice useful (from a grad course that I taught).

SWOT analysis:

Your proposal must include a SWOT analysis, which is a project planning tool/exercise. The focus of the SWOT analysis should not be on your idea, but on the various factors that will influence your ability to execute successfully. This includes things like human resources, time/scheduling constraints, etc.

There are three key things you should focus on when you put this together:

Do this as a team: don't outsource this to one team member
Be honest: if you are worried about something, this is your chance to get it out in writing
Be specific: you want each item in SWOT to be one concrete factor, so articulate it as tightly as possible.

Here are some fairly generic examples (i.e., yours should be more specific):

Internals (strengths/weakness):

s: all team members have worked with each other before, so are familiar with each other's work style
s: entire team has extensive experience in programming in Go
s: project is based on an existing system that is well documented and that two of the team members know inside and out
w: none of the team members know each other
w: team members have a variety of communication styles, some of which will require non-trivial management
w: project will be difficult because none of the team members understood Ivan's lecture on BitTorrent

Externals (threats/opportunities) -- you'll probably have fewer of these than the internal ones:

t: team decided to use Android phones, but this require finding a library that supports Go-Dalvik VM cross-compilation, which may or may not exist
t: three of the four team members might have to leave town to compete in the pan-American synchronized swimming competition; this would make them lose two weeks of project work.
o: one of the team members has a relative that works at Raspberry Pi who agreed to send us 100 Pis to use for the 416 project
o: new version of Go comes out in two weeks and the word on the street is that this version will include native support for distributed objects, which will make our project 10x faster to build

Your proposed project might evolve

The proposal is your best effort at scoping out the challenges that you expect to come up against and some ideas/plan on how you will resolve these. But, of course, system design and software engineering is not that predictable.

It's difficult to describe how much you can deviate from the proposal. So, UDP instead of TCP may not be a significant change for some proposals, but could be a major change for others (e.g., if you are investigating distributed congestion control adaptation in TCP and now change to UDP, the difference is major!).

Please discuss potential major changes with the TA assigned to your group and/or with Ivan.

Prototype implementation

There are no constraints on your distributed system design and implementation outside of the ones listed at the top. If you have any questions, please ask on Piazza.

Report

Your final report is a description of the problem you attempted to solve, what you have built to solve the problem, why you built your system the way you did, and how the system works/doesn't work. You should aim for a final report that is no more than 10 pages long.

Deliverables

For further details about the report/code/demo deliverables and marking process, see this Pizza post.

All project 2 deliverables are due at 11:59PM on their respective dates.

Project proposal

Use your group's stash project repository to submit your proposal. Place your proposal into proposal/proposal.pdf at the top level of your repository (if you use LaTex, make sure that it is compiled into a pdf).
To submit a project proposal draft, do the above step and also email Ivan the group's repository name. Use the subject line (with [[title]] replaced with your project title): [416] Project 2 proposal draft: [[title]]

Prototype implementation

~~Your repository should include a detailed README file that explains how to compile/configure/run your implementation.~~

Project report: a paper detailing the problem, your approach/solution, design of your prototype, and an evaluation of the prototype.

Use your group's stash project repository to submit your report. Place your report into report/report.pdf at the top level of your repository (if you use LaTex, make sure that it is compiled into a pdf).

Project demo: a TBD-minute private demo of your project to the instructor/group TA, including a technical Q/A regarding the project design and implementation.

The stash project repositories will not be frozen after you submit your code and report. So, you can continue to use your repository to develop and improve your demo.

Deadlines

The project is structured as a series of regularly occurring deadlines. Do not miss these! The deadline deliverable must be submitted through stash by 11:59PM the day of the deadline.

Mar 2 : Project proposal drafts (not marked, for feedback only). If you do not email Ivan, then he will not read your draft.
Mar 9 : Final project proposals
Mar 23 : Each team must send email to an assigned TA to schedule a meeting to discuss project status.
Apr 6 : Project code and final reports
Apr 9-20 : Project demos (TBD minutes/group)

Logistics

Grading scheme

Project 2 is 35% of your final mark. Here is the mark breakdown:

Proposal: 10%
Report and code: 15%
Demo: 10%
Peer review multiplier

Extra credit

This project is extensible with two kinds of extra credit:

EC1 [2% of final mark]: Add support for GoVector and ShiViz to your system. Generate comprehensible ShiViz diagrams that explain your distributed system data/control flow and protocol design. These diagrams/explanations must be in your final report and you must show a live demo (loading logs into ShiViz and generating and explaining the result). Store the logs for your diagrams in the report/demo in the report repository.
EC2 [2% of final mark]: Demonstrate the likely correctness of your system by using the Dinv dynamic program analysis tool. You must generate at least 3 types of invariants that illustrate 3 different kinds of correctness conditions of your system. These must be listed and explained in the final report. The logs that lead to the properties you describe must be part of the report repository. You do not have to demo Dinv.

Make sure to follow the course collaboration policy.