Assignment 2

416 Distributed Systems: Assignment 2 [Proof-of-work]

Due: Sunday January 24 at 6pm PST

Winter 2021

In this assignment you will continue programming in Go and will learn some basic inter-thread synchronization primitives. You will build a multi-threaded "bitcoin miner": a system that, based on an input slice of bytes, figures out by trial and error what suffix to that slice of bytes will hash the slice to a string representation that ends in some number of "0" characters. Due to the brute-force nature of the approach, it is an ideal target for divide-and-conquer methodology and is a great way to practice concurrency in Go lang.

Overview

This assignment's objective is introduce you to some threading primitives available in Go. Most of these are built-in directly into the language and are easy to use. You will learn:

How to use Go routines and Go channels
How to test concurrency in Go
How to use Go's md5 package

Proof-of-work

Proof-of-work is an important concept in blockchain-based systems, which are distributed systems for recording events, such as financial transactions. Replication of information in blockchains introduces the issue of trust: what if some malicious actor claims to be a replica, or somehow compromises a previously valid node, and attempts to falsify information?

One approach that helps with this problem is a cryptographic "might makes right" principle based on computing power. To generate a block in a blockchain system based on proof-of-work, nodes must perform a proof-of-work task: a task that is computationally difficult to perform, but the results of which are efficient to verify. This makes trust a function of computing power: unless a malicious actor controls a large fraction of computing power in the blockchain network, it will be computationally out-matched by the rest of the network.

The variation of proof-of-work used in this assignment is based on reversing the MD5 hashing function. While MD5 is no longer cryptographically trustworthy, and therefore not suitable for security-critical usage, the algorithm is widely available and simple to use. Most importantly for this assignment, it remains computationally hard to naively brute-force its reversal, while, like all reasonable hash functions, it is trivial to check if a result is correct by just running it in the intended direction.

The requirement of searching for a suffix of an existing byte slice allows for incorporating the notion of progression into proof-of-work: the prefix could be set to a hash of the blockchain's state-so-far, forcing the work to be computationally relevant to the blockchain's current state. Likewise, the number of "0" characters to find allows a configurable difficulty setting: hashes ending with just "0" will be more common than those ending with "00", and so forth, all the way to finding a hash that is all zeroes, which is very, very hard (try it!).

Parallelizing proof-of-work

Often, the division of a problem in a divide-and-conquer approach is self-evident from the problem structure. But, since it is not entirely obvious when dealing with byte sequences, and coming up with such a scheme is not the objective of the assignment, we will provide the method you should use:

The suffix to be calculated may have unbounded length, so somehow partitioning out segments of the suffix-space will not work.
Instead, consider the space of possible suffixes as a prefix-trie: as long as the number of threads is a power of two (or you are prepared to deal with a lot of edge cases), we can consider all possible suffixes as paths.
This set of paths can be partitioned by giving different workers different path fragments on which to build, ensuring each worker will explore separate parts of the search space.

Further assuming that the number of threads is a power of two, we can work through an example. Consider the case where we have 2 bits for the number of threads, giving us 2^2 = 4 threads. Each thread can then be given a different part of the key space:

Thread 0 can be given all suffixes starting with binary 00
Thread 1 can be given all suffixes starting with binary 01
Thread 2 can be given all suffixes starting with binary 10
Thread 3 can be given all suffixes starting with binary 11

Thus, we have enumerated all possible choices for the first 2 bits of the first byte of our possible prefixes. Now, each thread can explore the remaining 6 bits of that byte, along with any additional bytes in the suffix. The combined search operation will eventually cover all possible suffixes.

For even more simplicity, you can assume that we will not use thread counts larger than 2^8 = 256. So you do not have to worry about the thread count requiring more than one byte.

Implementation notes

The provided starter code will contain one public function that you should implement:

  func Mine(tracer *tracing.Tracer, nonce []uint8, numTrailingZeroes, threadBits uint) (secret []uint8)

The behaviour should be as follows:

nonce is a byte slice, which, when suffixed by the return value secret, should hash via MD5 to a byte slice whose string representation (via the format specifier "%x") ends in numTrailingZeroes '0' characters
threadBits indirectly specifies the number of worker goroutines to use when searching for secret: 1 << threadBits, or 2^threadBits, threads should be used. This should lead to an approximate 1/(1 << threadBits) decrease in execution time for different values of threadBits, keeping all other parameters the same.
tracer is used for tracing your solution's control flow. This can help you to check your solution and we will use for grading. See the Testing section below for a description of what tracing is required in this assignment. See https://github.com/DistributedClocks/tracing for more precise API documentation of the tracing library, which is published as a Go module on github.

Once a goroutine has found a satisfying suffix, all other goroutines should stop working as soon as possible, without completing their search. That is: you need a way to tell your worker goroutines to stop. All workers should signal cancellation or completion soon after a satisfying result has been found.
Shorter suffixes should be searched before longer ones; it is expected that you perform a simple linear search, only considering an additional byte once all options are exhausted for a given suffix length.
Use "crypto/md5" for hashing purposes.

We provide you with 6 files: (1) main.go, (2) hash_miner/hash_miner.go, (3) go.mod, (4) go.sum, (5) tracing_config.json, (6) tracing_server_config.json.

The file main.go contains code that invokes the hash_miner.Mine function in an appropriate context, including proper setup of the tracing facility. You can use this file for testing ad-hoc scenarios (including, for instance, running two searches in sequence), or as basis for you own automated tests in the style of assignment 1.

The go.mod file contains dependency tracking metadata, including a dependency on https://github.com/DistributedClocks/tracing, the tracing library.

The go.sum file contains auto-generated metadata (checksums for module dependencies). You should not need to touch or understand this file.

Your solution should be added to hash_miner/hash_miner.go, which is the only file you should meaningfully change. You may also modify main.go and add additional files (e.g., tests) for your convenience, but you should not rely on any changes there: we will discard these files during grading.

The configuration files tracing_config.json and tracing_server_config.json provide configuration for a basic instance of the tracer and tracing server facilities, respectively. As above, you may change them for your own convenience (e.g., more testing), but we will discard these during grading, replacing them with our own versions.

Testing

This assignment introduces the tracing library, which will be used to test whether your code is behaving correctly. The provided file main.go contains an example of how to run tracing on a single process, and the provided stub hash_miner/hash_miner.go contains examples of how to report actions being performed. Your solution should precisely follow the tracing semantics described in the next section.

No automated testing is provided for this assignment, though you are welcome to build your own, as has been demonstrated in assignment 1.

All graded tests will be performed via calls to hash_miner.Mine, orchestrated in some unspecified way. Timing will be measured, to ensure that when more threads are used, your code can find a solutions to the same problem more efficiently. With such an embarrassingly parallel problem, speed-up is expected to be roughly proportional to thread count. You can test this yourself by choosing a large number of zeroes and timing your code (perhaps using the Linux time command) for different values of threadBits.

Tracing semantics

In your solution, tracer.RecordAction should be called on one of the collection of structs defined in hash_miner/hash_miner.go to report a given action. Each action you need to report, alongside its meaning, is listed here:

MiningBegin{} signifies the start of mining. This should appear exactly once, before any other actions.
MiningComplete{Secret} signifies the end of mining. This should appear exactly once after all other actions, and should contain one of the discovered secrets. If hash_miner.Mine is called more than once, this should appear strictly before any actions relating to any subsequent calls.
WorkerStart{ThreadByte} indicates that a given worker with ThreadByte has started searching for the secret. Every worker must report this exactly once as part of its execution, before any other actions that worker might record.
WorkerSuccess{ThreadByte,Secret} indicates that the worker with ThreadByte has found a secret. This worker should terminate, recording no actions past this one.
WorkerCancelled{ThreadByte} indicates that the worker with ThreadByte received a cancellation. This worker should terminate, recording no actions past this one.

Note that ThreadByte refers to the bit prefix a worker has been assigned to explore. If we are considering two-bit prefixes, and a worker's ThreadByte is 3, then that worker should be exploring only secrets that begin with binary 11.

To precisely define "begins with", here are some examples, in Go slice syntax, of secrets that might be found by a worker with threadByte = 3 (in binary: 11), given an arbitrary nonce for which the secret is valid:

[]uint8{ 192, 168, 1, 1 }, because 192 is 11000000 in binary, which left-to-right starts with 11
[]uint8{ 242, 0, 1, 9, 6 }, because 242 is 11110010 in binary, which also starts with 11
[]uint8{ 193 }, because 193 is 11000001 in binary, which again starts with 11

Note that there are several (allowed) race conditions. It is possible for more than one worker to report a WorkerSuccess action. In this case, the MiningComplete{Secret} action should report the first secret that was found (corresponding to the first recorded WorkerSuccess action).

Also note that cancellation actions should be logged strictly after a secret has been discovered (after at least one WorkerSuccess action).

To illustrate all these things together, take for example the following correct tracing output, using numTrailingZeroes=7, threadBits=4:

[miner] MiningBegin 
[miner] WorkerStart ThreadByte=1
[miner] WorkerStart ThreadByte=3
[miner] WorkerStart ThreadByte=15
[miner] WorkerStart ThreadByte=2
[miner] WorkerStart ThreadByte=10
[miner] WorkerStart ThreadByte=5
[miner] WorkerStart ThreadByte=4
[miner] WorkerStart ThreadByte=6
[miner] WorkerStart ThreadByte=0
[miner] WorkerStart ThreadByte=9
[miner] WorkerStart ThreadByte=13
[miner] WorkerStart ThreadByte=14
[miner] WorkerStart ThreadByte=11
[miner] WorkerStart ThreadByte=12
[miner] WorkerStart ThreadByte=7
[miner] WorkerStart ThreadByte=8
[miner] WorkerSuccess ThreadByte=12, Secret=[194 170 210 13]
[miner] WorkerCancelled ThreadByte=9
[miner] WorkerCancelled ThreadByte=11
[miner] WorkerCancelled ThreadByte=4
[miner] WorkerCancelled ThreadByte=2
[miner] WorkerCancelled ThreadByte=3
[miner] WorkerCancelled ThreadByte=14
[miner] WorkerCancelled ThreadByte=13
[miner] WorkerCancelled ThreadByte=10
[miner] WorkerCancelled ThreadByte=7
[miner] WorkerCancelled ThreadByte=0
[miner] WorkerCancelled ThreadByte=6
[miner] WorkerCancelled ThreadByte=1
[miner] WorkerCancelled ThreadByte=15
[miner] WorkerCancelled ThreadByte=8
[miner] WorkerCancelled ThreadByte=5
[miner] MiningComplete Secret=[194 170 210 13]

Note that date/time info, alongside some benign logging noise, have been omitted relative to what you might see when actually running your solution. These are the important parts you should pay attention to.

Assumptions you can make

No malicious or otherwise invalid inputs will be given to hash_miner.Mine. Values will be given within specified ranges, according to stated invariants.

Assumptions you cannot make

Anything beyond the basic semantics of goroutines and channels; sufficient testing will likely expose any such assumptions as violating some of the tracing conditions.

Implementation requirements

The client code must be runnable on CS ugrad machines and be compatible with Go version 1.15.6.
Your code must be compatible with the given API, and run successfully against the provided test cases.
Your solution can only use standard library Go packages, and the tracing library.
Your solution code must be Gofmted using gofmt.

Starter code

You can download all the starter files here: starter.zip.

Note that you cannot change the API, or the tracing structs. Our marking scripts will rely on these to automatically grade your solution.

Handin instructions

Keep the provided file layout at the top-level in your solution. You may include additional files, but we do not expect this will be necessary. Do not reference any additional libraries in your solution.

As with A1, use your personal repository for this assignment under the CPSC416-2020W-T2 org. Your repository will look like cpsc416_a1_USERNAME

Rough grading scheme

Your code must not change the API above. Your code must work on ugrad servers.

If any of these are violated, your mark for this assignment is 0. This is true regardless of how many characters had to be changed to make your solution compile, and regardless of how well your solution works with a different API or on a different machine.

The high-level A2 mark breakdown looks like this:

5% : MiningBegin action is reported correctly
5% : MiningComplete action is reported correctly
5% : WorkerStart action is reported correctly
5% : Output secret is associated with the right goroutine via ThreadByte prefix
10% : WorkerSuccess action is reported correctly
10% : WorkerCancelled action is reported correctly
40% : The solution runs in parallel across the right number of goroutines
20% : The output secret is correct

Advice

Hint: look at bytes.Buffer, and fmt.Fprintf (it is not immediately obvious, but bytes.Buffer satisfies io.Writer) when dealing with the string representation of a hash.
Compile and run your code on the ugrad servers.

Make sure to follow the course collaboration policy and refer to the submission instructions that detail how to submit your solution.