Approximating Hit Rate Curves using Streaming Algorithms - FLS Talk by Nick Harvey, UBC/CS

Date
Location

DMP 110 (6245 Agronomy Rd.)

Abstract:
A hit rate curve is a function mapping cache size to the proportion of requests that can be served from the cache, for a fixed caching policy and a fixed sequence of requests. Hit rate curves have been studied for decades in the operating system, database and computer architecture communities. They are useful tools for designing appropriate cache sizes, dynamically allocating memory between competing caches, and for summarizing locality properties of the request sequence.
 
Existing algorithms for computing hit rate curves typically process a stream of requests online, in a single pass and in near-linear time. An obstacle to using hit rate curves is often the amount of space required by existing algorithms. For a cache using the Least Recently Used policy and a stream of m requests for n cacheable objects, all existing algorithms to compute the hit rate curve use space linear in n, which is often prohibitive.
 
We present the first algorithm for approximating hit rate curves with sublinear space and guaranteed accuracy. Our algorithm can approximate a hit rate curve at L uniformly-spaced points to within additive error epsilon using O( L^2 log(n) log^2(m) / epsilon^2 ) bits of space. The algorithm requires little code to implement, and it performs extremely well in practice. The central idea of the algorithm is to execute in parallel numerous estimators for the number of distinct elements in a data stream.
 
Bio:
Nick Harvey works on a broad range of topics in algorithm design, including combinatorial optimization, randomized algorithms, spectral graph theory and algorithms for use in computer systems. He completed his PhD at the Massachusetts Institute of Technology in 2008. He was a postdoctoral researcher at Microsoft Research New England from 2008-2009, and an assistant professor at the University of Waterloo from 2009-2011. Since 2011 he is an assistant professor and Canada Research Chair at the University of British Columbia. He is a recipient of the the 2013 Alfred P. Sloan Research Fellowship, the Best Student Paper Award at FOCS 2006, and the Best Paper Award at USITS 2003.