Project 2 is an open-ended project that must be done in a team of 4
people and must be (at least partially) deployed on Azure. The extra
number of people (over project 1) will provide you with sufficient
developer power to execute on an ambitious project.
Type of project
Your project must address a non-trivial problem related
to distributed systems. It must include a substantial software
effort in Go. Note that 'substantial' includes complexity and not just
code size. The most direct way to satisfy the project requirement is
to prototype a distributed system. Such a system can be built from
scratch, but the project can also be formulated as a non-trivial
extension to an existing system. The idea behind the system does not
need to be original, but the majority of the distributed logic in the
implemented system must be implemented by the project team.
As a benchmark, your project must have about the same
complexity/difficulty as project
1.
Project constraints (evolving):
- Go must be used for the core distributed logic in the
system. However, other languages may also be used in the
project. For example, you can build a distributed system in Go and
have Android clients, implemented in Java, that connect to it and
use it.
- The system must be able to support node churn: nodes that fail
and leave the system, as well as nodes that join the system.
- The system cannot be embarrassingly parallel: there must be some
distributed state and coordination between nodes in your
system.
- At least some part of the system must be deployed on Azure.
- The system must be well tested.
Project ideas
Here are several project ideas. Treat these as inspiration; I strongly
encourage you to come up with your own project idea.
Project idea: Consensus-Based Failure Detection
After A1 you've become more interested in failure detectors. You start
reading more and realize that the failure detector you developed in A1
is not robust to network partitions, which may cause inconsistencies
in the system about the state of a node. For example, Node A and Node
B are monitoring Node C. If Node A is partitioned from Node B and Node
C, then Node A will incorrectly report that Node C has failed while
according to Node B, Node C is still alive. You find out about the
Paxos algorithm and realize that this problem is in fact a consensus
problem. You decide to implement a consensus protocol, like Paxos or
Raft, on top of your failure detector library to decide if a majority
of systems believe that a node has really failed.
Project idea: Regionally Restricted Streaming Service
If you were to take a random survey of current UBC students to
determine the answer to the question "How do you procrastinate?", the
majority of the responses would have something to do with watching
Netflix. But as a current 416 student, you have already been exposed
to web-proxies and CDNs. So, now your favorite way of procrastinating
is to build Netflix from scratch instead of watching it. To do this,
you would need to make a video streaming service built on top of a
custom CDN, which provides regional restrictions. To make your system
more robust, you may also choose to use a distributed key-value store
to house your user data.
Project idea: A version of DynamoDB
"Amazon is expanding in downtown!" is something you have gotten used
to hearing after living in Vancouver the last few years. As an
upper-year CS major, you have started looking for well-paid full time
jobs and identified Amazon as a viable destination. Being a smart
group of 416 students, you decide to build Amazon's DynamoDB from
scratch as you think that would be a good way to impress an Amazon
recruiter. You also like how they use CRDTs to provide eventually
consistent reads but you are also interested in finding out how they
can also provide strongly consistent reads across their whole
distributed database. And, to widen your scope, you decide to use your
newly built DynamoDB on Azure instead of AWS to increase your chances
of impressing a Microsoft recruiter, as well!
Project idea: Build an automated grading framework
As some of you have already seen and experienced by now, distributed
systems are hard to build and even harder to debug. One of the main
reasons for this is the presence of different sources of
non-determinism present in the system, specifically due to the
network. This makes it even harder to grade complex systems as testing
for basic correctness properties of a distributed system requires a
lot of time/effort. As some of the current and previous 416 TAs will
tell you, writing specialized auto-grading frameworks for each
distributed system can be time-consuming and error-prone. Your
mission, should you choose to accept it, is to create an automated
grading framework that works for any general purpose distributed
system which provides a clean and tidy interface to control and
specify different network conditions as well as a way to specify
different test cases.
Project idea: Build an anonymity network
Tor
is an anonymity system built
on onion
routing. Tor allows clients to obfuscate their network
identity/location (IP address). The idea is simple, but supporting
multiple clients, defending against attacks, and providing good
performance to clients (e.g., responsive browsing) are non-trivial
requirements.
One version of this project is to prototype a basic version of Tor,
deploying it on Azure, and demonstrating that you can use it to
browser the internet. A basic version might include:
- Handling connecting/disconnecting guard/relay/exit nodes
- Secure onion routing (intermediate hops do not observe payload)
- Circuit setup/tear-down protocols
- Periodic circuit refresh to avoid using a circuit for too long
Tor is just one type of anonymity system. If you are interested in
this space, there are a variety of other system designs that you can
adopt. Or, feel free to create a new one!
Project idea: Build a peer-to-peer machine learning system
Machine learning is all the rage. There are many distributed
frameworks, but all of them assume a centralized learning process with
access to a central store of training data. Build a peer-to-peer
solution for learning a global model (of a variety of your choice)
that has as few centralized components as possible and where data is
spread across peers. Assume an adversarial context in which peers do
not want to reveal their data to others. For this project you may want
to recruit to your team someone who has taken CPSC 340 (and has done
well in it). You can also substantially expand the security/privacy
requirements of this project.
Project idea: Build a distributed web crawler/search engine
Web crawling is kind of a 90s topic. But, an efficient and scalable
version is a complex distributed system with many interesting
pieces. An assignment
from 416-2016w2 describes an 'assignment' version of a web
crawler that is a good starting point. This version described a set of
worker crawlers that are spread over multiple data-centers, a
web-graph that is maintained in a distributed fashion, a distributed
page rank computation, and keyword search capability. You could extend
this version or consider building a different variant.
Some other project ideas
- Build a distributed object system,
like Emerald
but without a compiler.
- Build a distributed shared memory system,
like Treadmarks
- Build a distributed assertions mechanism that can be used for
runtime checking of distributed systems.
- Implement a byzantine fault tolerance algorithm, an example
is PBFT
Proposal
A project proposal is a paper that details the problem, your
proposed approach/solution to the problem, a realistic timeline for
your team's actions to create the solution, and
a SWOT
analysis for your team/project.
You should aim for a proposal that is about 5 pages long. Shorter and
you're probably missing some detail; longer and it becomes too
detailed and too long to read. That said, there are no page
limits (lower bounds nor upper bounds) on your proposal.
Here are three example proposals from a prior instance of this
course:
Here are two high-level ways in which I think about your proposal:
- A proposal is a contract. If you build the thing described
in the proposal then you get a perfect mark on the project. But,
writing good contracts is hard work. For example, a good contract must
be precise (it should be clear what you are and are not going to do).
- A proposal is your opportunity to convince me that you know
what you're getting yourself into. I won't let you do a project if
I know that you do not stand a reasonable chance of succeeding at it
(this is a distributed system course, not an SE course :-) So, the
proposal should convince me that you know what you're doing -- that
you've thought about the key issues (you know what they are,
approximately how you're going to solve them), you know what resources
you will need/where you will get them
(technology/libraries/algorithms/data sources/hardware/etc), that you
thought about how to manage your time and how to manage the team roles
and responsibilities (who does what/when), and that it all adds up to
a realistic plan for a successful project.
You may also find the
following proposal
advice useful (from a grad course that I taught).
SWOT analysis:
Your proposal must include a SWOT analysis, which is a project
planning tool/exercise. The focus of the SWOT analysis should not be
on your idea, but on the various factors that will influence your
ability to execute successfully. This includes things like human
resources, time/scheduling constraints, etc.
There are three key things you should focus on when you put this together:
- Do this as a team: don't outsource this to one team member
- Be honest: if you are worried about something, this is your chance
to get it out in writing
- Be specific: you want each item in SWOT to be one concrete factor,
so articulate it as tightly as possible.
Here are some fairly generic examples (i.e., yours should be more specific):
Internals (strengths/weakness):
- s: all team members have worked with each other before, so are
familiar with each other's work style
- s: entire team has extensive experience in programming in Go
- s: project is based on an existing system that is well documented
and that two of the team members know inside and out
- w: none of the team members know each other
- w: team members have a variety of communication styles, some of
which will require non-trivial management
- w: project will be difficult because none of the team members
understood Ivan's lecture on BitTorrent
Externals (threats/opportunities) -- you'll probably have fewer of these than the internal ones:
- t: team decided to use Android phones, but this require finding a
library that supports Go-Dalvik VM cross-compilation, which may or may
not exist
- t: three of the four team members might have to leave town to compete
in the pan-American synchronized swimming competition; this would make
them lose two weeks of project work.
- o: one of the team members has a relative that works at Raspberry
Pi who agreed to send us 100 Pis to use for the 416 project
- o: new version of Go comes out in two weeks and the word on the
street is that this version will include native support for
distributed objects, which will make our project 10x faster to
build
Your proposed project might evolve
The proposal is your best effort at scoping out the challenges that
you expect to come up against and some ideas/plan on how you will
resolve these. But, of course, system design and software engineering
is not that predictable.
It's difficult to describe how much you can deviate from the
proposal. So, UDP instead of TCP may not be a significant change for
some proposals, but could be a major change for others (e.g., if you
are investigating distributed congestion control adaptation in TCP and
now change to UDP, the difference is major!).
Please discuss potential major changes with the TA assigned to your
group and/or with Ivan.
Prototype implementation
There are no constraints on your distributed system design and
implementation outside of the ones listed at the top. More details
listed below. If you have any questions, please ask on Piazza.
Report
Your final report is a description of the problem you attempted to
solve, what you have built to solve the problem, why you built your
system the way you did, and how the system works/doesn't work. You
should aim for a final report that is no more than 8 pages long. More
details below.
Details about deliverables
- Project proposal: a well thought out project plan.
-
[Draft proposal submission] To submit a project proposal
draft, create a private piazza post. For this post use the title:
"Project 2 proposal draft: [[title]], uidA uidB uidC uidD" with
with [[title]] replaced with your project title/name. The
easiest way for us to give you feedback is if your post includes
a link to an editable google doc containing your
proposal. Make sure to identify the group members in the pizza
post and the proposal body.
-
[Final proposal submission] If you submitted a google doc
that we can access (above), then continue working on that doc
and we will snapshot your google doc at deadline time. If you
did not submit a google doc, then you can update your piazza
post with the final proposal doc copy, preferably in pdf format.
- Project report: a paper detailing the problem, your
approach/solution, design of your prototype, and an evaluation of
the prototype.
- Report must be 8 pages max. This includes all the things that
you want Ivan/TAs to read/see, including ShiViz diagrams.
- Use your group's github project repository to submit your
report as a pdf document. Place your report
into report/report.pdf at the top level of your
repository (if you use LaTex, make sure that it is compiled into
a pdf).
- Use whatever format you want, but please don't torture us with
font size 8 and awful margins. I recommend the
2-column ACM
article format.
- You can copy/paste and reuse text from the proposal.
- But, of course, don't plagiarize other's work! Attribute all
the images/text you borrow; standard writing practices apply.
- Your report must stand on its own -- cannot refer to proposal
or to a youtube video where you explain your system in an
hour-long lecture.
- The report must describe the system whose code you are
submitting by the code/report deadline. This means that if you
have some bits that are unfinished, but you plan to finish them
for the demo, then you must explicitly note in your
report that they are a work in progress. Note that the
ShiViz extra credit cannot be a work in progress item; we have
to see ShiViz diagrams for your system in the report.
- The report should include an approximate description of your
demo script. The demo has specific requirements (see below), and
we want to see an outline of your demo plan that matches these
requirements in your report. This doesn't have to be long: a short
paragraph per demo stage is fine.
- Prototype implementation: the details of your system,
primarily its code.
- Make sure to create a repository with the name
P2-uidA-uidB-uidC-uidD using the uids of the four
team-members on your team. Follow previous
assignment/project instructions
to allow Ivan and the TAs access to the repository (but not
other teams in the course).
- We will read your code.
- We will read the code that is in your repo by the
deadline.
- Note that this version of the code must correspond to what
you describe in your report. For example, if you describe a
Tor-based system in your report and do not note any
work-in-progress items, and we do not see any onion encryption
code in your repo, your mark will be penalized.
- It is okay if your code doesn't work completely! Your report
should note what currently works and what doesn't work.
- No, you are not required to include code comments, compilation
instructions, or anything else that would make our code-reading
lives easier. (Though we would certainly appreciate any such
effort).
- Project demo: a 25-minute private demo of your project to
the instructor and group TA, including a technical Q/A regarding
the project design and implementation.
- The github project repositories will not be frozen
after you submit your code and report. So, you can continue to use
your repository to develop and improve your system for the demo!
Yes, that means that you can add new code/change existing
code/etc.
- No, we don't care how much new code you add between report and
the demo -- if only 10% of your proposed system is built by
report-time (and 90% is a work-in-progress), then expect
penalties on the code/report. The demo is a separate beast
marking-wise.
- If you are working on the EC, you must generated GoVector logs
and use ShiViz live during your demo to receive full EC
marks. You can do so during the normal operation step, or
another step in the demo (below).
- Your demo is 25 minutes long. Here are the
components/time/demo-mark break down:
- Demonstrate normal operation of your system (no
failures/joins) with at least 3 nodes.
- 10min expected
- 40% of demo mark
- Demonstrate system can survive at least 3 node
failures
- 5min expected
- 20% of demo mark
- Demonstrate system can join and utilize at least 3
new nodes
- 5min expected
- 20% of demo mark
- Design Q/A
- 5min required (we will stop you at 20min mark to do Q/A)
- 20% of demo mark
- We will ask questions of the entire group and anyone on
your team can answer.
- There are several critical notions in the rubric above that
will vary from system to system (group to group):
- Normal operation: show that your system achieves its stated
function (e.g., serves HTML to web-browser clients from a CDN,
sends email via ToR, etc)
- Survive: show that your system's normal operation is not
disrupted by the failures (e.g., game continues to be playable
after failures)
- Utilize: show that your system actively uses the newly
joined nodes (e.g., database integrates and uses new nodes to
store keys/values)
- To get full marks on the demo you must (1) define the above in
the report or in the demo, and (2) demonstrate to us that the
above conditions are satisfied by your system during the demo
(e.g., when you fail/join nodes). You can do this by some of the
following:
- Show us terminal output with copious verbal
explanations
- Show us a web browser GUI that shows us blinking lights that
semantically match the above goal
- Robots that behave as expected (where you defined for us
what is expected)
- Some other, typically runtime system I/O, means
- For failures, you can decide which nodes to fail and how
(though if you fail the Azure LB that has a standby that you did
not build.. you won't be getting much/any of that
20% survival mark).
- Yes, we want to see you inject failures, preferably on a
terminal with a Ctl-C signal. Same for node joins.
- Your project must use Azure in some way. You have to
explain/show that this is indeed the case.
- Your system must use a real network between your nodes --
distributed systems that uses localhost for communication will be
severely penalized.
- As with project 1 demos, your slot is tight -- we may have
scheduled other groups before/after your group. I strongly
encourage you to practice your demo multiple times and develop a
demo script.
- Note that unlike project 1 demos, your project 2 executes in
your environment. This means that you can curate it and set it up
just the way you want it.
- Some projects may have special requirements (e.g., prohibit
failure of 3 nodes). If this is the case, post on piazza to
arrange a change to your demo components. You must discuss these
with us before your demo and we have to sign off on any
deviations from the above in writing via a piazza post.
Deadlines
All project 2 deliverables are due at 11:59PM on their respective
dates. The project is structured as a series of regularly occurring
deadlines Do not miss these!
-
Nov 12 : Project proposal drafts (not marked, for feedback only).
-
Nov 17 : Final project proposals
-
Nov 24 : Each team must send email to an assigned TA to schedule a
meeting to discuss project status.
-
Dec 9 : Project code and final reports
-
Dec 10-13 : Project demos
Grading scheme
Note that two key project deliverables are write-ups
(proposal/report). The proposal write-up alone is 10% of your final
mark! The proposal and the final report must clearly convey the
high-level ideas, be technically thorough, and must be
well-written. Quality technical writing takes time and care. Use
well-established methods to improve your writing: draft increasingly
detailed outlines, get feedback from your peers/TAs on early ideas and
drafts, compose descriptive infographics/diagrams, use the
spellchecker, etc. Proposal write-ups that are vague, incomplete, or
incoherent will receive a poor mark (you will also probably have to
redo your proposal, but with much less time).
Project 2 is 30% of your final mark. Here is the mark breakdown:
- Proposal: 10%
- Report and code: 10%
- Demo: 10%
Extra credit
This project is extensible with an extra credit:
-
EC1 [2% of final mark]: Add support
for GoVector
and ShiViz to your
system. Generate comprehensible ShiViz diagrams that explain your
distributed system data/control flow and protocol design. These
diagrams/explanations must be in your final report and you must show a
live demo (loading logs into ShiViz and generating and explaining the
result). Store the logs for your diagrams in the report/demo in the
report repository.
Make sure to follow the
course collaboration policy.
|