CPSC 449: Honours Thesis Joanna Knowitall 12352881 knowitall@cs.ubc.ca September 28, 1992 Supervisor: Jim Blah Project: Neverland Neverland is a distributed file system, dealing with storage of files across a system of distributed machines. It maintains a certain number of copies of each file on the client across all the connected machines. For each newly created file on the client machine, a policy is created, determining such things as the number of replicas necessary, and the frequency of updates. This is presented in the form of a table that maps each node to these values. In addition to the client machine, there is a network of interconnected replica nodes. These monitor each other constantly, to pass on the updated copies of files and also to detect failure of any of the nodes. One of these nodes, usually the one that has the smallest 'staleness factor' - in this case, this term simply means how often a file on the client machine is replicated to this node - is chosen as the leader. The leader's difference only comes into play when one of the nodes fails. In that case, all the other nodes inform the leader of the failure. It is then the leader's job to choose a new replica node to replace the failed one. It then builds a list/table of files that were on the failed node, and propagates the list to the new node. From this point, the new node takes over, and the leader goes back to its regular job. The new node must go through the list provided, and obtain copies of all the files. Since this is the part my project will be dealing with, specifically, I will discuss it in more detail below. The problem I will be dealing with can be summarised as the following: given a set (list) of files, how to efficiently pull the files from a number of replica nodes spread out across a network. A number of factors must be considered. One requirement is that the new node should probably get the newest copies of each file available, but not if that is likely to otherwise compromise the efficiency of the process. Other considerations are bandwidth available and load on the nodes, both the sending and receiving node, which would limit your ability to copy the files, and make an otherwise attractive node an inefficient choice. Connection limits factor into the decision, as they would also influence the time it would take to fully populate the new node with the necessary files. However, these factors should probably be secondary to the currency of the file version, and used to break ties and the like. The rough outline of the work plan is as follows. Over the next few months, I will be looking at various publications in the field over the past few years, and getting acquainted with the system (and code) that is in place. I have a few sources that I can start with, that have been referenced in previous research on Neverland, and from there I should be able to branch out into different sources. A good area to look in would be peer-to-peer systems, and those dealing with storage, in particular, as well as searching for articles on replication, how it's done and what are the most efficient ways to do it. The research on replication and file systems should help me get a better grasp on the area, and also compare this system against similar ones that may have been developed. This will help me figure out where the particular piece that I will be working on can fit best. By the end of November I should have completed a design document outlining the structure and functionality of the proposed component. Actual implementation should begin soon after that, going into second term. Having done the research on similar file systems with similar functionality, I should have enough information to be able to compare my implementation to others that have been proposed or finished. The implementation and testing should be done by March, and the rest of the time will be spent compiling the results and writing the report.