416 Distributed Systems: Project 2 [Open ended]
Report due Apr 6th at 11:59PM
Project 2 is an open-ended project that must be done in a team of 3-5 people and must be (at least partially) deployed on Azure. For this project you can use the same team from project 1, or you can form a new team. I encourage you to form teams of 4 people: this will allow us to grant you more time for your demos, and will provide you with sufficient developer power to execute on an ambitious project.
Note that two key project deliverables are write-ups (proposal/report). The proposal write-up alone is 10% of your final mark! The proposal and the final report must clearly convey the high-level ideas, be technically thorough, and must be well-written. Quality technical writing takes time and care. Use well-established methods to improve your writing: draft increasingly detailed outlines, get feedback from your peers/TAs on early ideas and drafts, compose descriptive infographics/diagrams, use the spellchecker, etc. Proposal write-ups that are vague, incomplete, or incoherent will receive a poor mark (you will also probably have to redo your proposal, but with much less time).
Type of project
Your project must address a non-trivial problem related to distributed systems. It must include a substantial software effort in Go. Note that 'substantial' includes complexity and not just code size. The most direct way to satisfy the project requirement is to prototype a distributed system. Such a system can be built from scratch, but the project can also be formulated as a non-trivial extension to an existing system. The idea behind the system does not need to be original, but the majority of the distributed logic in the implemented system must be implemented by the project team.
As a benchmark, your project must have about the same complexity/difficulty as project 1.
Project constraints (evolving):
Here are several project ideas. Treat these as inspiration; I strongly encourage you to come up with your own project idea.
Project idea: Build an anonymity network
Tor is an anonymity system built on onion routing. Tor allows clients to obfuscate their network identity/location (IP address). The idea is simple, but supporting multiple clients, defending against attacks, and providing good performance to clients (e.g., responsive browsing) are non-trivial requirements.
One version of this project is to prototype a basic version of Tor, deploying it on Azure, and demonstrating that you can use it to browser the internet. A basic version might include:
Tor is just one type of anonymity system. If you are interested in this space, there are a variety of other system designs that you can adopt. Or, feel free to create a new one!
Project idea: Build a peer-to-peer machine learning system
Machine learning is all the rage. There are many distributed frameworks, but all of them assume a centralized learning process with access to a central store of training data. Build a peer-to-peer solution for learning a global model (of a variety of your choice) that has as few centralized components as possible and where data is spread across peers. Assume an adversarial context in which peers do not want to reveal their data to others. For this project you may want to recruit to your team someone who has taken CPSC 340 (and has done well in it). You can also substantially expand the security/privacy requirements of this project.
Project idea: Build a distributed web crawler/search engine
Web crawling is kind of a 90s topic. But, an efficient and scalable version is a complex distributed system with many interesting pieces. Last year's assignment 5 describes an 'assignment' version of a web crawler that is a good starting point. This version described a set of worker crawlers that are spread over multiple data-centers, a web-graph that is maintained in a distributed fashion, a distributed page rank computation, and keyword search capability. You could extend this version or consider building a different variant.
Some other project ideas
A project proposal is a paper that details the problem, your proposed approach/solution to the problem, a realistic timeline for your team's actions to create the solution, and a SWOT analysis for your team/project.
You should aim for a proposal that is about 5 pages long. Shorter and you're probably missing some detail; longer and it becomes too detailed and too long to read. That said, there are no page limits (lower bounds nor upper bounds) on your proposal.
Here are three example proposals from the last time I taught this course with an open-ended project:
Here are two high-level ways in which I think about your proposal:
You may also find the following proposal advice useful (from a grad course that I taught).
Your proposal must include a SWOT analysis, which is a project planning tool/exercise. The focus of the SWOT analysis should not be on your idea, but on the various factors that will influence your ability to execute successfully. This includes things like human resources, time/scheduling constraints, etc.
There are three key things you should focus on when you put this together:
Here are some fairly generic examples (i.e., yours should be more specific):Internals (strengths/weakness):
Externals (threats/opportunities) -- you'll probably have fewer of these than the internal ones:
Your proposed project might evolve
The proposal is your best effort at scoping out the challenges that you expect to come up against and some ideas/plan on how you will resolve these. But, of course, system design and software engineering is not that predictable.
It's difficult to describe how much you can deviate from the proposal. So, UDP instead of TCP may not be a significant change for some proposals, but could be a major change for others (e.g., if you are investigating distributed congestion control adaptation in TCP and now change to UDP, the difference is major!).
Please discuss potential major changes with the TA assigned to your group and/or with Ivan.
There are no constraints on your distributed system design and implementation outside of the ones listed at the top. If you have any questions, please ask on Piazza.
Your final report is a description of the problem you attempted to solve, what you have built to solve the problem, why you built your system the way you did, and how the system works/doesn't work. You should aim for a final report that is no more than 10 pages long.
For further details about the report/code/demo deliverables and marking process, see this Pizza post.
All project 2 deliverables are due at 11:59PM on their respective dates.
The project is structured as a series of regularly occurring deadlines. Do not miss these! The deadline deliverable must be submitted through stash by 11:59PM the day of the deadline.
Sign up for a project stash repository here.
Project 2 is 35% of your final mark. Here is the mark breakdown:
This project is extensible with two kinds of extra credit:
Make sure to follow the course collaboration policy.