In this assignment you will create a simple distributed
key-value service to replace the centralized key-value service
from assignment 3. Your service will use a
collection of nodes to distribute the load of storing keys/values and
will include a designated front-end node to mediate access to the
distributed key-value service.
Overview: The centralized key-value service from assignment 3
has several drawbacks: the hosting node is a single point of failure,
the hosting node may run out of storage capacity or become overloaded
with client requests. In this assignment you will distribute the
storage of keys/values (across a set of kv nodes), but you will retain
a single node (the front-end) for accessing the key-value
service. Your system will have the following architecture:
Note that the front-end node mediates all access to the kv nodes and
the kv nodes do not communicate among each other. As in the previous
assignment your system must work with an arbitrary number of clients
(N), and must also work with an arbitrary number of kv nodes
(K). Further, your system must provide an identical interface to the
client nodes (as in the previous assignment), balance the storage load
of keys/values across the kv nodes, and handle joining and failing kv
nodes.
Front-end node. The front-end is responsible for (1) responding
to client queries, (2) tracking kv nodes as they join and fail, and
(3) partitioning the key-space between the kv nodes (i.e., determining
which kv node is responsible for storing which set of keys/values).
KV nodes. A kv node is a storage node that associates some set
of keys with values. A kv nodes communicates solely with the front-end
node. A kv node does not maintain any durable state.
Implementation requirements:
-
You cannot change the client API of the key-value service
from assignment 3. This API and its
semantics must remain identical. That is, your code from the previous
assignment must continue to interoperate and work with your new
distributed key-value service.
-
Exactly one kv node must be responsible for storing the value of a
key. If this kv node fails, the key becomes (and must remain)
unavailable.
- Given a sufficiently long time during which kv node joins and
failures do not occur and new keys are not created (i.e., written to
by clients), the number of keys stored by kv nodes is balanced. That
is, the front-end must actively load balance the storage load across
kv nodes.
-
As in the previous assignment you must use
the GoVector
library to track the partial ordering between important events
(e.g., message sends/receives and synchronization events) in your
distributed system and to log these events to a log file.
-
The front-end must register itself in the GoVector timestamps (i.e.,
it should also maintain a GoVector log file).
Assumptions you can make:
- The front-end node does not fail
- No network failures
Assumptions you cannot make:
- Processes have synchronized clocks (e.g., running on the same
physical host).
Advice:
-
As in assignment 3, be methodical and create the distributed key-value
service step-by-step. First, create the front-end that maps all keys
to a single, hard-coded, key-value service node. Do not continue until
you have a working version of this basic service. Then, consider how
to add new key-value service nodes. Solve key load-balancing in the
final step (it is an optimization, so leave it until the very end)
-
The API between the front-end and the key-value nodes is up to
you. Feel free to define an API of your choice.
-
Clients and the front-end must communicate using RPC. The front-end
and the kv nodes may communicate using whatever protocol you choose.
Solution spec: Write two go programs:
- A program that implements the front-end node, as described above,
and takes the following arguments:
- ip:port : address of the key-value service to be used by clients
- ip':port' : address to be used by kv nodes
- logfile : filename to which GoVector log messages will be written
- A program that implements the kv node, as described above, and
takes the following arguments:
- ip':port' : address of the front-end node
- logfile : filename to which GoVector log messages will be written
What to hand in:
- A writeup (at most one page long) that describes your design
and how it satisfies the above requirements
- Your implementations of a front-end node and kv node
- Three GoVector-based logs, each one a concatenation of the
individual log files generated by the set of all nodes in the
system. Each log must correspond to one of the following scenarios:
- log-kv-join.txt : The first kv node joins the system and
the service handles a few client operations.
- log-kv-join-fail.txt : Two kv nodes join the system, the
service handles some client operations. One of the kv nodes fails, and
the keys it stored become unavailable to clients.
- log-kv-balance.txt : A log that demonstrate active storage
load balancing of keys by the front-end in response to new keys being
created, or kv nodes joining/failing.
I recommend that you use ShiViz to inspect each of these logs to check
that they indeed capture correct behavior of your system.
Also, note that your system must handle other possible scenarios (not
just the ones described above).
Meta:
Make sure to follow the
course collaboration policy and refer
to the assignments instructions
that detail how to submit your solution.
|