416 Distributed Systems: Assignment 3
Due: Feb 8th at 9PM
2016W2, Winter 2017
In this assignment you will extend your peer-to-peer system from assignment 2 to handle peer joins and peer failures.
The system has a similar structure and purpose as in assignment 2: there are some number of peers and there is one server. As before, the peers need to coordinate their server interactions to retrieve some number of resources, and distribute these resources appropriately. In this assignment you will add two more features to your system: (1) a protocol by which new peers can join the existing running peers, and (2) ability to detect and recover from peer failures.
Supporting joining and failing peers leads to two peer-server interaction changes:
Supporting joining and failing peers features will also substantially change how your peers coordinate and necessitates a re-definition of the three requirements in assignment 2. However, these two features do not change the three server-communication constraints in assignment 2 and your peers must continue to obey those constraints. As before, peers will print information to the screen and exit when there are no more resources. Below we list the updated peer-to-server protocol, precisely define the two new features, and update the three requirements for your solution. As before, the peer-to-peer protocol, supporting peer joins, failure detection, and failure recovery, is unspecified and is up to you to design and implement.
The peer-to-server protocol is similar to assignment 2. The server listens to connections from peers and expects two kinds of RPC invocations: InitSession and GetResource. The InitSession initiates a session and returns a token called sessionID that must be used in the call to GetResource. The GetResource call returns a string, that we term a resource, and also the logical peerID that is associated with the resource. Your peer code must appropriately maintain a map between logical and physical peerIDs. Resources for the same logical peerID must live on the same physical peer. However, this mapping can (and must be able to) change, for example, as peers fail. Precisely, these RPCs look like this:
In interacting with the server your peer group must continue to satisfy the three constraints: (Constraint1) the sessionID RPC must be invoked exactly once, (Constraint2) two consecutive invocations of the GetResource RPC cannot come from the same peer, and (Constraint3) the GetResource RPC must be invoked exactly (1 + numR) number of times, where numR is the numRemaining value received from the first call to GetResource.
Note that in the case that the system contains a single peer and the server returns a numRemaining value that is greater than 0, then the peer should block and wait until more peers join the system.
Peer joins. A peer can be started in 'bootstrap' mode or in 'joining' mode. These are indicated with a command line flag. In bootstrap mode the peer begins a new peer group and is the first peer in the group. In joining mode the peer knows of one other peer that is already in the group and must join the peer group through that peer. A peer has not technically joined the peer group until it invokes the JoinPrint function distributed in the starter code. Both joining and bootstrapping peers must invoke this function once they have joined.
Peer failures. Peers may fail in a fail-stop manner. Exceptions to when peers may fail are listed below under assumptions. A good way to think of this kind of failure is as if the peer stopped executing because the physical machine that hosted the peer has had its power unplugged. Peers that fail appear as having failed to all other peers simultaneously, possibly after a timeout that can be used to detect such failures.
The diagram on the right illustrates how your system might evolve over time. Initially just the server is part of the system, then peer 1, a bootstrapping peer joins; then, two more peers join, and finally peer 1 fails. This is one example scenario. Your code must support every possible combination of peer joins and failures.
Your peer group must satisfy three (updated) requirements:
You will test and debug your solution against a server that we will give to you. Your peer can be run in one of two modes, in both modes the peer is given a physical peer ID (the peer's identity among peers) and an IP:port to use for peer-to-peer communication. In bootstrap mode the peer is also given the server's RPC TCP IP:port, while in joining mode the peer is instead given the IP:port of a peer that is currently part of the peer group.
Assumptions you can make
Assumptions you cannot make
Write a single go program called peer.go that acts as a peer in the protocol described above. Your program must implement the following two modes of command line usage:
Bootstrap mode:go run peer.go -b [physical-peerID] [peer-IP:port] [server ip:port]
Joining mode:go run peer.go -j [physical-peerID] [peer-IP:port] [other-peer ip:port]
Download the starter code. Please carefully read and follow the RPC data structure comments at top of file.
You can also download and use a bash script to automatically start several peers.
Download the server binary. This binary should be run on a CS ugrad server using ./server [local RPC ip:port] [numLogicalPeers], where the first argument is the TCP IP:port on which to listen for incoming RPC connections from peers, and numLogicalPeers is the number of logical peers to use.
Rough grading rubric