Project 1

416 Distributed Systems: Project 1

Due: November 4 at 11:59pm

Fall 2018

In this project you will work in a team of two to build a simple distributed file system that runs on top of a custom blockchain network. Your records file system (RFS) will have a global name space and no access control mechanism. Files in RFS are composed of fixed-length records that can either be read or appended (previously appended records cannot be modified). Your blockchain network will be a randomly connected peer-to-peer network that is loosely based on BitCoin: clients submit operations on files, these operations are flooded among nodes, and nodes mine blocks to add operations to the blockchain. You'll use proof of work to counter sybil nodes/spam and incentivize mining with record coins, which are units of storage in RFS and are necessary for clients to create files/add new records to files.

There are several extra credit options, including adding new APIs (deleting a file), and creating a rogue client to demonstrate an attack on your system.

You will deploy and test your RFS system in VMs across multiple data centers in the Azure cloud.

Nodes overview

There are two types of nodes in your system: some number of miners and some number of clients. Together these nodes belong to an instance of the RFS system (a single blockchain and a single global name space of record-based files).

Peer-to-peer graph of miners

RFS miners form a peer-to-peer graph (based on command-line information that tells each miner who their immediate peers are). Miners only communicate with their immediate peers. A miner can only initiate connections to those miners specified as part of the initial command line arguments. To be able to communicate with the other miners in the system, a miner must use a flooding-style protocol, which you will design and implement.

Miners may join the network and they may depart (e.g., because of failure). If a miner knows of at least one peer, then it is connected. If the miner happens to lose connectivity to all of the peers that it knows about, then that miner becomes disconnected. A disconnected miner can only become connected if another miner connects to it (it has no way of learning about other miners while it is disconnected). Note that a miner must always be able to accept incoming connections from any other miner (that knows the right IP:port to connect to this miner).

The diagram below illustrates a connected graph of connected miners, with each miner connected to at least three others. Miner A has three peers: nodes B, C, D. Node E is a disconnected miner as it has no peers.

An RFS miner has two roles: it mines RFS coins for clients to consume, and it participates in the network to help maintain the blockchain. Miners have the following responsibilities:

Each miner must disseminate operations sent to it by clients that are connected to it to the rest of the miner network. The miner must also help other miners with their operation dissemination.
A miner mines record coins for the clients to use (clients use RFS lib to implement applications that use records coins to accomplish their goal, such as adding records to a file). Mining is done using a proof of work algorithm based on assignment 1 from the 2017w2 offering of this course.
By solving the proof of work, the miner generates a block for the blockchain and some number of coins that is credited to an identifier that the miner was initialized with. The generated block includes the set of RFS operations that the miner knows about (or is a no-op block; see below).
A miner validates mined blocks it receives and disseminates valid blocks (that it mined locally and that other miners mined) to the rest of the miner network.

Clients receive service from miners

RFS clients are applications that use the RFS lib to connect to a miner and to submit operations on files stored by RFS. RFS clients are (1) unaware of the blockchain, (2) utilize the record coins mined by the miner that they connect to, and (3) are the only way for applications to access/interact with RFS.

The diagram below shows the miners network from above with several clients. Note that a miner can have a variable number of clients: some miners may have no clients, and others may have many clients. Also note that each client is only connected to exactly one miner and that clients do not communicate with each other.

The diagram below captures the design and implementation of your system in layer-form:

RFS specification

RFS uses the blockchain as storage and organizes this storage into a set of files that live in a global name space. This name space is accessible (readable and writeable) to any node that is part of the network. Note that there is no access control in RFS; in general, this system design is simple and extremely unsecure. There are also no directories in RFS, only files. RFS files can be created, but never destroyed. An RFS file has a name that is at most 64 bytes long, can have at most 65,535 (uint16) records, and is composed of a totally-ordered sequence of fixed-size records. In your implementation you'll use a record size of 512 bytes.

RFS has a limited API that allows a client to list the existing file names, create a new file, append a record to an existing file, find out the number of records in a file, and read a record at a specific index in a file. This API is detailed further below. As in A1, you are not allowed to change this API. We also provide several test client programs that you can use to test your API.

Because RFS is built on top of a blockchain, RFS will provide eventual consistency semantics to clients.

RFS library API

The RFS library exposes an interface that is detailed in the rfslib.go file. You are not allowed to change this interface. All of the calls require connectivity and return DisconnectedError if the library is no longer connected to a miner or if the miner is disconnected (has no peers). All of the calls below are blocking calls: they do not return until they have finished successfully, or an error occurred. In particular, calls that require record coins to succeed, must block until (1) the miner can mine sufficient coins to process the operation, and (2) the operation succeeds (is confirmed by the network). Furthermore, RFS calls only return state that has been confirmed (e.g., ListFiles only returns files that have been confirmed).

rfs, err ← Initialize(localAddr, minerAddress)

Initializes the RFS library. The client provides the local address to use to connect to the miner, and the address of the miner to connect to, which is mining on behalf of this client. Returns error, or if error is not set, then an rfs instance.

err ← CreateFile(fname)

Creates a file with name fname. Returns a FileExist error if fname already exists.

flist, err ← ListFiles()

Retrieves a list of existing files in the global namespace.

num-recs, err ← TotalRecs(fname)

Counts and returns the number of records in a file identified by the name fname. Returns a FileDoesNotExist error if fname does not exist.

err ← ReadRec(fname, index, *record)

Retrieves a record at a specific index in the file identified by name fname. Note that if a record at this index does not yet exist, this call must block until the record at this index exists, and then return the record (through the passed-in pointer). Returns a FileDoesNotExist error if fname does not exist.

recordNum, err ← AppendRec(fname, record)

Appends a record to the file identified by name fname. Returns recordNum, which is the position of the newly appended record. Note that this call does not take an index: a record should be appended to the end of the file, wherever that might be. Returns a FileDoesNotExist error if fname does not exist.

Blockchain overview

Miners maintain a tree representation of the block chain. The 'chain' refers to the longest path in this tree, starting at the genesis block whose hash is specified on the command line. A miner should only compute no-op and op blocks along the chain, and not along any shorter path in the tree. In the case that there are several (longest) chains, the miner should (1) pick the one that does not cause a validation error for the current op block it is generating, or if no op block is being generated or none cause a validation error, then (2) the miner should pick among the chains uniformly at random.

The miners flood two kinds of state: operations, and blocks. Operations can be of two types: create file (generated by a CreateFile API call) and append record (generated by an AppendRec API call). Note that all other API calls must be serviced by the miner that the client is connected to. For example, the call to ReadRec should not generate any network traffic between miners.

Block generation. Each miner must implement a mining procedure by which it can generate record coins and add a new block in the block chain. A miner can only compute one block at a time and cannot work on multiple blocks simultaneously. There are two kinds of blocks: no-op blocks and op blocks.

A no-op block does not contain any operations. These blocks are generated so as to prevent pre-computation of long block sequences by malicious nodes and to confirm existing operations (more about confirmation below). All miners should always be working on no-op blocks in the background, constantly generating these and adding them to the block-chain.
An op block contains several operations. When a client notifies rfslib of an operation and sends it to its local miner, the miner immediately (1) floods the operation to other miners it is connected to, (2) stops working on no-op blocks (but not op blocks), and (3) switches to work on an op block to integrate the new operation into the block-chain. The miner can and should integrate multiple operations that it knows about (potentially broadcast/generated by other miners) into one op block.

We are not going to constrain the size of your block. We recommend that you choose a block size that can hold no more than 100 operations. Miner settings specify a GenOpBlockTimeout parameter. This parameter specifies the minimum time in milliseconds between generating op blocks. This timeout allow the miner to aggregate multiple operations into a single op block and also prevents the miner from blocking op block generation in case there are a few operations. The following diagram explains how your mining algorithm should behave over time and the role of the GenOpBlockTimeout:

Block data structure. An op block is a data structure that contains at least the following data:

A hash of the previous block in the chain (prev-hash)
An ordered set of operation records; each operation record should include:

rfslib operation details (op)
An identifier that specifies the miner identifier whose record coins sponsor this operation (op-minerID)

The identifier of the miner that computed this block (block-minerID). This ID will be provided as part of the settings file for each miner. This ID is used by the rest of the network to credit this miner ID with the record coins rewarded with mining a block in the system.
A 32-bit unsigned integer nonce (nonce)

Block hashes (e.g., prev-hash) must be an MD5 hash and must contain a specific number of zeroes at the end of the hex representation of the hash (a constant specified in arguments from command line). A op block's hash is a hash of all the block contents. The goal of the proof-of-work algorithm is to find a nonce such that the block's hash contains the required number of zeroes. The more zeroes, the longer it takes to generate the block (find a nonce that works). More precisely, the miner must find a nonce value such that the MD5 hash value of the block contents has at least N zeroes at the end of the hash (where N is a configuration setting argument).

You can use sample code from 2017w2 assignment 1 as a starting point for your proof of work code (see the computeNonceSecretHash method).

Note that a block hash computed for a block by a miner A will always be different from blocks generated by other miners (for the same op/no-op) and will always differ from other blocks generated by miner A. This is because a block contains prev-hash, which uniquely identifies its position in the blockchain tree, and a block contains the miner's ID, which makes each block unique to miner A.

A no-op block is identical to an op block except that it does not include any operations. Its hash is similarly computed using a proof-of-work algorithm.

Block validation. This project assumes some amount of trust between miners. For example, miners are trusted to associate their IDs with operations of clients that are connected to them. However, it is important that the miners validate blocks. Here is a minimal set of validations that each miner should perform:

Block validations:

Check that the nonce for the block is valid: PoW is correct and has the right difficulty.
Check that the previous block hash points to a legal, previously generated, block.

Operation validations:

Check that each operation in the block is associated with a miner ID that has enough record coins to pay for the operation (i.e., the number of record coins associated with the minerID must have sufficient balance to 'pay' for the operation).
Check that each operation does not violate RFS semantics (e.g., a record is not mutated or inserted into the middled of an rfs file).

Operation confirmation. As noted in the previous section, RFS API calls that require new operations in the blockchain (AppendRec and CreateFile) must be confirmed before returning. Confirmation of an operation means that the block containing the operation must be followed by some fixed number of other blocks in the longest blockchain. This confirmation number of blocks is provided in command line settings to the miner and may differ for append and create file operations. The reason that confirmation is necessary in a blockchain-based system is because of concurrency: if two different blocks containing operations are generated at the same time, then only one of these blocks (and corresponding operations) will end up on the longest chain. Confirmation blocks give the client further assurance that the block with their operation is indeed part of the longest blockchain (the probability of another chain being longer diminishes with more confirmations due to proof of work).

Note that read-only calls, such as ListFiles and ReadRec, should only return state that has been confirmed. For example, ReadRec should not return record data if it is part of a block that has not been confirmed (by definition above).

Servicing rfslib API calls. The rfslib must implement the rfslib interface. However, the way to accomplish this is to send each operation to the miner and have the miner process the operation, either locally or by coordinating with the broader miner network. As noted above only two types of operations are flooded in the miner network (and integrated into blocks by all miners): file appends and file creates. All other operations are serviced locally because they do not need to be integrated into the blockchain (e.g., there is no record of a ReadRec in the blockchain). To service an operation locally, the miner maintains a view of RFS global state for the longest chain in the blockchain that it knows about (e.g., the files that exist at this point and their contents). The miner would service some client RFS operations based on this view. Note that the miner must continue to respond to client operations (and it must support multiple clients) even though the blockchain changes.

Other notes:

Two concurrent and conflicting operations (i.e., two calls to CreateFile with the same filename by two different clients) cannot be in the same block. Your design must guarantee this.
Conflicting operations also cannot appear along the longest path in the blockchain (this check must be part of validating an operation). For example, this means that if two clients simultaneously issue a CreateFile with the same filename, then one of the clients should (eventually) get back an error. The other operation must succeed. In general, exactly one of the conflicting operations must succeed, while the others must get an error.
All accounting in the blockchain is done in terms of record coins, the state of which is maintained by the blockchain. All miners start with a record coin balance of 0. Record coins can only be created during block mining, and record coins are only used up, or spent, when a file is created or when a new record is appended to a file. An append of a record to a file always requires exactly 1 record coin. The other coin generation/usage values are parameterized in command line settings.

Handling failures

Miners and clients may fail stop. If a client fails while it is blocking on an operation, the miner may abort the operation or it may commit the operation. If the miner fails, the rfslib instances of clients that are connected to this miner should return a DisconnectedError to the application.

You can assume that a miner that has failed does not ever rejoin the same RFS system instance with the same miner ID.

Miners should be able to detect peer failures by using some form of failure detection. You may want to use your A1 fdlib for this (though consider making it less aggressive by introducing a delay between heartbeat messages). You may use the same mechanism to detect miner failure at clients and client failure at miners.

Miners and clients should not store or cache any state on disk. A failure therefore completely wipes out the node's state.

As long as a miner is connected to at least one other miner, it should continue to operate in the miner network.

Example RFS clients/applications

We provide you with 6 example programs that you can use to test your system. Each of these programs assumes a file called .rfs in the current directory that contains two lines. The first line must contain the local IP:port address to use to connect to the miner, the second line must contain the IP:port address of the miner to connect to (these are exactly the two arguments to rfslib.Initialize described above).

ls [-a] : lists the files in the global namespace. The optional -a argument can be used to also list the number of records in each file.
cat fname : output all of the records in fname to stdout.
tail k fname : outputs the last k records in fname to stdout.
head k fname : outputs the first k records in fname to stdout.
append fname record-string : appends a new record string to fname.
touch fname : creates a blank file fname.

Azure Deployment

The Azure cloud is composed of several data-centers, located world-wide. Each data center hosts many thousands of machines that can be used (i.e., rented) for a price. In this project you will use the Azure cloud to deploy and test your solution in the wide area. That is, you will deploy your web cache across several VMs on Azure.

Although you will test and deploy your system on Azure, it will have nothing Azure-specific about it, e.g., it should be possible to run your system on the CS ugrad servers without any code modifications (though we will not test this).

Using Azure: stop VMs when not using

We prepared a google slides presentation covering the basic workflow of getting a VM running on Azure for this/future assignments. To setup the Go environment in a VM you can use the azureinstall.sh script.

The default Azure subscription comes with a limitation of 20 cores per region. It is likely that you will need more than 20 nodes in this project. For this consider using several different data centers around the world.

Use this site to check your account balance.

Access information will be posted to piazza.

A key detail is that each second that your VM is running it is draining your balance (yikes!). You should STOP your VMs when you are not using them. It's up to you to police your own account.

Implementation Requirements

Your system must be runnable on Azure Ubuntu machines configured with Go 1.9.7
Your solution can only use standard library Go packages.
Your solution code must be Gofmt'd using gofmt

Solution Spec

Write two go programs called miner.go and rfslib.go that behave according to the description above. Hand in your code using a private UBC github with a name format of P1-uid1-uid2 where uid1 and uid2 are the CS student ids of your team.

The rfslib.go file is the library whose spec is described above. The miner.go file implements a miner process and must have the following command line usage:

go run miner.go [settings]

Settings is a json file (see example config.json) that has the following fields. There are two types of fields: those that do not differ between miners in an instance of an RFS system, and those that do differ between miners.

Settings that do not differ between miners:

uint8 MinedCoinsPerOpBlock : The number of record coins mined for an op block
uint8 MinedCoinsPerNoOpBlock : The number of record coins mined for a no-op block
uint8 NumCoinsPerFileCreate : The number of record coins charged for creating a file
uint8 GenOpBlockTimeout : Time in milliseconds, the minimum time between op block mining (see diagram above).
string GenesisBlockHash : The genesis (first) block MD5 hash for this blockchain
uint8 PowPerOpBlock : The op block difficulty (proof of work setting: number of zeroes)
uint8 PowPerNoOpBlock : The no-op block difficulty (proof of work setting: number of zeroes)
uint8 ConfirmsPerFileCreate : The number of confirmations for a create file operation (the number of blocks that must follow the block containing a create file operation along longest chain before the CreateFile call can return successfully)
uint8 ConfirmsPerFileAppend : The number of confirmations for an append operation (the number of blocks that must follow the block containing an append operation along longest chain before the AppendRec call can return successfully). Note that this append confirm number will always be set to be larger than the create confirm number (above).

Settings that differ between miners:

string MinerID : The ID of this miner (max 16 characters).
string[] PeerMinersAddrs : An array of remote IP:port addresses, one per peer miner that this miner should connect to (using the OutgoingMinersIP below)
string IncomingMinersAddr : The local IP:port where the miner should expect other miners to connect to it (address it should listen on for connections from miners)
string OutgoingMinersIP : The local IP that the miner should use to connect to peer miners
string IncomingClientsAddr : The local IP:port where this miner should expect to receive connections from RFS clients (address it should listen on for connections from clients)

The above set of settings are required. But, you may include other settings, though please check with us first. You may also decide to start the miner using a different variant of the 'go run' command.

Advice

Here are several points of advice that we recommend that you follow:

This project requires a non-trivial amount of design thinking and planning. Plan out as much as you can before starting to write code. One way to drive design is to think about and work through concrete scenarios. For example, consider a concrete topology of miners and two clients, and work through the scenario in which each client submits an operation. Consider what should happen when these operations conflict (e.g., attempt to create the same file), how these operations should be ordered, what happens if different generated blocks contain the same operation, etc.
Note that you have to design 2 APIs: the API between a client and a miner, and the API between two miners. Think about the requirements of these APIs: what should they provide to each party? Use these to drive your API design.
A major design component is to come up with the data structures in your system. For example, you'll need data structures for operations, blocks, the tree of blocks, the chain of blocks, etc. List all of the data structure you will need. Use the concrete scenarios to drive the design/requirements of these data structures. Do not forget that protocols between nodes, such as flooding, will also require supporting data structures.
Think carefully about how you would implement the RFS operations on top of your data structure design. For example, should you include the 'index' of a record in the operation that appends the record, or should you let the index be implicitly defined by the location of this record relative to other/previously appended records to this file? Even minor design decisions have non-trivial implications for the design of the RFS APIs.
Depending on your data structures, your operations may need unique identifiers. For example, if a client calls AppendRec twice with the same content, your system should append 2 copies of the content (not 0, not 1, not 4, not 10).
Your blockchain must maintain information about who owns what/how many record coins in the system. Think about how you will represent this information in your blocks.
Miners will connect and disconnect (churn). You will need protocol support to allow a miner to 'catch up' on the most recent state of the blockchain
You have limited time for this project. So, it is critical that in your implementation you focus on correctness of your system, rather than performance. Your mark solely depends on how correct your system is. We do not care about how slow/inefficient your system is.
The example programs listed above, like ls, cat, and others, are end-to-end tests for your RFS system. We will use these (and other) programs during your demo to check the correctness of your system. Make sure that these programs run successfully with your system under a variety of conditions. For example, if one client writes a record to a file using append, can another file read that record using cat?
EC3 is intended to help you to debug your system. It will take some time to get GoVector and ShiViz to work, but they will save you a lot of time in the long run. Do not work on this EC last; instead work on this EC first, as you are starting to implement your system. If you do EC3 after you are done with your system, it will be useless to you.

Extra Credit

EC1 [2% of final mark]. Add a new API call, DeleteFile(fname), to delete a file from RFS. It should look like this:
- err ← DeleteFile(fname)
The updated RFS library interface that you must use is detailed in the rfslib-delete.go file (make sure to rename it to rfslib.go in your repository/implementation). Implementing DeleteFile requires you to carefully extend the current spec. Deleting a file operation should cost no record coins to perform. But, it must credit the miner IDs that have ever contributed operations to this file: it must credit the miner ID that created the file being deleted and the miner IDs that have appended any records to the file being deleted. For confirmation use the same confirmation number as for creating a file. To get credit for this EC, you must be able to demonstrate all of the above properties of your DeleteFile API to us.

For testing your DeleteFile implementation you can use the rm.go client:
- rm : removes a file from the global namespace.

EC2 [1% of final mark]. Create a malicious client and miner to demonstrate how a malicious miner could abuse the system. Your malicious miner should submit client operations into the miner network using other miner IDs and perform no mining of its own (neither op or no-op blocks). When forging the miner ID on operations the malicious miner should use the ID of the miner that currently has the most record coins in the system.

EC3 [2% of final mark]: Add support for GoVector and ShiViz to your system. These are logging/log analysis tools that help you understand what happened in your system at runtime (GoVector implements vector clock tracking and ShiViz visualizes a log with vector clocks as a time space diagram). You must show us a live demo of GoVector+ShiViz in your system to get points for this extra credit. That is, you must run your system, generate logs, download the logs to your local machine, visualize these logs using ShiViz, and explain the ShiViz diagram to us.

Grading scheme and marking process

We will follow a demo style grading scheme, similar to A2. At the high level your mark for this assignment is 20% of your final mark and has these components:

Code: 10%
Demo: 10%

Note that the demo actually exercises both the Code and the Demo portion of your mark. So, the best way of thinking about the demo rubric below is that it is 100% of your mark. Of course, we reserve the right to look at your code on our own if we want to to further convince ourselves about some functional aspect or to check that you do indeed implement some particular piece (e.g., calculating the number of coins a specific miner has).

The demo has several parts and will take 25 minutes:

Part A (20%) - Single miner, multi client (3 min)
Part B (60%) : Multi miner, multi client (10 min)
Part C (10%) : Topology failures and recovery (7 min)
Part D : Extra credit demo, if any
Part E (10%): Q/A about system design (5 min)

Each demo will be supervised by 2 instructors: either two TAs or Ivan and one of the TAs. Each team has a guaranteed slot of 25 minutes. Note that 25 minutes may be a hard cut off, especially in cases where there is another team scheduled after your team. (We give ourselves about 5-10 minutes to deliberate your mark and review the demo after you leave the room. The more time we have for this, the more likely that you will get a fair shake).

Note that your demo will be run in our environment. That is, when you enter the demo room, we will have Azure VMs up and running and ready for your demo. The VMs will have:

Go version 1.9.7 installed
Emacs/vi/vim installed
GoVector library
Your project 1 git repository cloned and set to the last commit before the deadline
(Please talk to us before the demo if you have other software requirements.)

We will release a set of IPs (public and private) for two sets of VMs on piazza. You will use one of these VM sets to demo your system.

When you come in, we will give you time to setup the miners. The miner topology will not be revealed to you. Each VM will contain a settings.json file which will be used to configure the miners. You will be responsible for starting the miners but in parts A-D we will be using and running our own client programs to test the behaviour of your system.

Part A: Single miner, multi client

In part A, we will run several client programs that are connected to the same miner. This part will involve us testing basic functionality of your system, such as creating a file, appending records to a file, and reading records from a file.

Part B: Multi miner, multi client

In part B, we will run several client programs that are connected to different miners. Things we will be testing (with total percentage adding up to 60% for Part B):

15% : Create file semantics. Creating a file and verifying it was created via a client connected to a different miner
20% : Read/write semantics. Writing to a file and verifying the file contents via a client connected to a different miner
15% : Concurrency. Multiple clients issuing operations concurrently and checks to make sure your system resolves the concurrency (e.g., create of file with same name should only succeed once).
10% : Coin mining and tracking. Checking that you are mining and attributing the correct number of coins to the miners and correctly calculating the miner's balance.

Part C: Topology Failures and Recovery

In part C, we will test your system's ability to survive miner failures and to be able to include newly joined miners. This may include network partitions/joins.

Part D: Extra Credit Scenarios

EC1: We will delete a file and verify that the file no longer exists via client connected to a different miner. We will also verify that reading a record from / appending a record to a deleted file results in an error.

EC2: We will ask you to launch a malicious miner using the settings.json provided by us and you will be responsible for demonstrating how a malicious miner could abuse the system.

EC3: We will ask you to generate and show/explain logs by using ShiViz.

Part E: Design Questions

In this part we will ask you questions from a list of questions that we have curated.

We will try to make sure there is a whiteboard available for you to draw diagrams if you need one.
Each person on the team should attempt to answer the question on their own first. If the person is unable to do this, then the other person can jump in and help answer the question.
The questions are high-level design questions about your system. They are the kinds of questions that everyone on the team should know the answer (at least to some degree!).
These questions may escalate into discussion about your design choices more broadly.

Preparing for the demo

Nice to have, or strongly recommended for the demo:

Strongly recommend including (comprehensible) stdout/logging in your codebase so that we can track things as they occur on the terminal. If you lack this, then we can only use the output of our client programs. If these clients programs do not work, then it will be hard (impossible) for us to justify any partial marks.
Since the setup time counts as part of your 25 minutes, you can (and should!) write scripts to automate the process of logging into VMs, starting up miners in a designated directory, etc.

What to bring for the demo: You will need your own laptop(s) to start the miners. We recommend bringing two devices so that you can coordinate with your partner, and you have some redundancy + 4 hands can type faster than 2 hands.

Honesty and collaboration

Make sure to follow the course collaboration policy and refer to the submission instructions that detail how to submit your solution.