This assignment will teach you about clock synchronization in a
distributed setting. You will implement
the Berkeley
algorithm.
High-level protocol description: There are two kinds of nodes
in the system: a single master node, and a set of slave
nodes. Periodically, the master queries all of the slave nodes for
their local time and each node replies with their current time. The
master then computes the time difference between its local clock and
that of each of the slaves (deriving a set of delta values,
one per slave). Next, the master computes avg, a fault
tolerant average of the delta values: an average over the largest set
of delta values that differ from one another by at most a pre-defined
constant d. Then, for each slave the master computes a time
correction value by computing the difference between avg and
the delta value for the slave. Finally, the master sends this
correction value to each slave for the slaves to adjust their clocks.
Low-level protocol description: See
the original
paper for an example protocol execution and details of how the
master computes the delta values. This is described at the beginning
of Section III. Section IV details the protocol.
Implementation requirements:
- The master must also synchronize its time along with the slaves
(as described in the original
paper)
- As long as a slave is alive its time should eventually be
synchronized (with times of the other alive slaves and the
master)
- UDP must be used for messaging
- Your implementation must be robust to message loss
- Your implementation must be robust to message delay (e.g., slave
replies to previous synchronization rounds should be ignored)
- Your implementation must be robust to slave halting failures (the
master should proceed with synchronization as long as at least one
slave replies with its time)
- Each node in your implementation should maintain and synchronize a
process-local clock (not the system clock).
-
Each node should periodically log its local time to stdout,
particularly before and after synchronization.
-
You must use
the GoVector
library to track the partial ordering between important events
(e.g., message sends/receives and synchronization events) in your
distributed system and to log these events to a log file.
Assumptions you can make:
- The master node does not fail
- The set of slave nodes (alive and failed) does not change
- Clock values are monotonically increasing positive integers
(e.g., Unix timestamps)
Advice:
- Avoid explicit handling of message loss
- Bound how long a synchronization round may last
- The GoVector
library can help you debug your implementation. Use
the ShiViz
tool to visualize the GoVector-generated log.
Solution spec: Write a single go program that acts as a time
master or a time slave, depending on the command line options passed
to the program. Given a -m flag in the first position the
program should behave as a time master and expect the following
arguments on the rest of the command line:
- ip:port : address that the master should use to communicate with all of the slaves
- time : the process-local time that the master should be initialized with
- d : the d threshold used in the fault tolerant average (see above)
- slavesfile : a file with as many lines as there are slaves, each
line contains an "ip:port" address of a slave
- logfile : filename to which GoVector log messages will be written
Given a -s flag in the first position the program should
behave as a time slave and should expect the following three arguments
on the rest of the command line
- ip:port : address that the slave should listen on for server's messages
- time : the process-local clock value that the slave should be initialized with
- logfile : filename to which GoVector log messages will be written
Your implementation should have a tick loop that updates the clock
value and pins it to the rate of the physical clock on the host.
Meta:
Make sure to follow the
course collaboration policy and refer
to the assignments instructions
that detail how to submit your solution.
|