Date: March 5th, 2009
Room: DMP 310
Speaker: Ian Munro
Title: Succinct Data Structures
Abstract:
Although computer memories, at all levels of the hierarchy, have grown dramatically over the past few years, increased problem size continues to outstrip this growth. Recently developed data compression techniques attack one major aspect of the problem, but here we focus on structural information: combinatorial objects such as trees, other classes of graphs, permutations and the like. The interest is in representations that are not only terse, but also permit the basic operations one would expect on the underlying data type to be performed quickly without decoding large portions of the data. We call such data structures succinct. The archetypal example is the binary tree, whose usual representation requires 4n lg n bits, if we are to navigate up and down the tree and report subtree size. The information theoretic minimum, however, is only about 2n bits. For trees of a few billion nodes this is a factor of about 64 between the conventional representation and the information theoretic minimum. Such binary trees are particularly interesting as they can be used for indexing large texts (or genetic information). A factor of 64 or more in space costs makes the difference between an approach being very attractive and totally infeasible. We present a representation requiring essentially this minimal space while supporting, in constant time, the natural operations used in traversing a tree. The general approach is then applied to several other structures to obtain optimal (or near optimal) space bounds while still supporting the key operations in constant time. Finally inherent time/space tradeoffs will be discussed.