Assignment 2

538B Distributed Systems: Assignment 2

Due: Oct 20th at 6PM

Fall 2016

This assignment will teach you about clock synchronization in a distributed setting. You will implement the Berkeley algorithm.

High-level protocol description: There are two kinds of nodes in the system: a single master node, and a set of slave nodes. Periodically, the master queries all of the slave nodes for their local time and each node replies with their current time. The master then computes the time difference between its local clock and that of each of the slaves (deriving a set of delta values, one per slave). Next, the master computes avg, a fault tolerant average of the delta values: an average over the largest set of delta values that differ from one another by at most a pre-defined constant d. Then, for each slave the master computes a time correction value by computing the difference between avg and the delta value for the slave. Finally, the master sends this correction value to each slave for the slaves to adjust their clocks.

Low-level protocol description: See the original paper for an example protocol execution and details of how the master computes the delta values. This is described at the beginning of Section III. Section IV details the protocol.

Implementation requirements:

The master must also synchronize its time along with the slaves (as described in the original paper)
As long as a slave is alive its time should eventually be synchronized (with times of the other alive slaves and the master)
UDP must be used for messaging
Your implementation must be robust to message loss
Your implementation must be robust to message delay (e.g., slave replies to previous synchronization rounds should be ignored)
Your implementation must be robust to slave halting failures (the master should proceed with synchronization as long as at least one slave replies with its time)
Each node in your implementation should maintain and synchronize a process-local clock (not the system clock).
Each node should periodically log its local time to stdout, particularly before and after synchronization.
You must use the GoVector library to track the partial ordering between important events (e.g., message sends/receives and synchronization events) in your distributed system and to log these events to a log file.

Assumptions you can make:

The master node does not fail
The set of slave nodes (alive and failed) does not change
Clock values are monotonically increasing positive integers (e.g., Unix timestamps)

Advice:

Avoid explicit handling of message loss
Bound how long a synchronization round may last
The GoVector library can help you debug your implementation. Use the ShiViz tool to visualize the GoVector-generated log.

Solution spec: Write a single go program that acts as a time master or a time slave, depending on the command line options passed to the program. Given a -m flag in the first position the program should behave as a time master and expect the following arguments on the rest of the command line:

ip:port : address that the master should use to communicate with all of the slaves
time : the process-local time that the master should be initialized with
d : the d threshold used in the fault tolerant average (see above)
slavesfile : a file with as many lines as there are slaves, each line contains an "ip:port" address of a slave
logfile : filename to which GoVector log messages will be written

Given a -s flag in the first position the program should behave as a time slave and should expect the following three arguments on the rest of the command line

ip:port : address that the slave should listen on for server's messages
time : the process-local clock value that the slave should be initialized with
logfile : filename to which GoVector log messages will be written

Your implementation should have a tick loop that updates the clock value and pins it to the rate of the physical clock on the host.

Meta: Make sure to follow the course collaboration policy and refer to the assignments instructions that detail how to submit your solution.