Technical Reports

## 2001 UBC CS Technical Report Abstracts

TR-2001-01 Quantum Signal Propagation in Depolarizing Channels, March 28, 2001 Nicholas Pippenger, 7 pages

Quantum Signal Propagation in Depolarizing Channels Nicholas Pippenger Abstract: Let X be an unbiassed random bit, let Y be a qubit whose mixed state depends on X, and let the qubit Z be the result of passing Y through a depolarizing channel, which replaces Y with a completely random qubit with probability p. We measure the quantum mutual information between X and Y by T(X; Y) = S(X) + S(Y) - S(X,Y), where S(...) denotes von Neumann's entropy. (Since X is a classical bit, the quantity T(X; Y) agrees with Holevo's bound chi(X; Y) to the classical mutual information between X and the outcome of any measurement of Y.) We show that T(X;Z) <= (1-p)^2 T(X;Y). This generalizes an analogous bound for classical mutual information due to Evans and Schulman, and provides a new proof of their result.

TR-2001-02 Analysis of Carry Propagation in Addition: An Elementary Approach, March 28, 2001 Nicholas Pippenger, 23 pages

Analysis of Carry Propagation in Addition: An Elementary Approach Nicholas Pippenger Abstract: Our goal in this paper is to analyze carry propagation in addition using only elementary methods (that is, those not involving residues, contour integration, or methods of complex analysis). Our results concern the length of the longest carry chain when two independent uniformly distributed n-bit numbers are added. First, we show using just first- and second-moment arguments that the expected length C_n of the longest carry chain satisfies C_n = log_2 n + O(1). Second, we use a sieve (inclusion-exclusion) argument to give an exact formula for C_n. Third, we give an elementary derivation of an asymptotic formula due to Knuth, C_n = log_2 n + Phi(log_2 n) + O((log n)^4 / n), where Phi(x) is a bounded periodic function of x, with period 1, for which we give both a simple integral expression and a Fourier series. Fourth, we give an analogous asymptotic formula for the variance V_n of the length of the longest carry chain: V_n = Psi(log_2 n) + O((log n)^5 / n), where Psi(x) is another bounded periodic function of x, with period 1. Our approach can be adapted to addition with the "end-around" carry that occurs in the sign-magnitude and 1s-complement representations. Finally, our approach can be adapted to give elementary derivations of some asymptotic formulas arising in connection with radix-exchange sorting and collision-resolution algorithms, which have previously been derived using contour integration and residues.

TR-2001-03 Proving Sequential Consistency by Model Checking, May 17, 2001 Tim Braun, Anne Condon, Alan J. Hu, Kai S. Juse, Marius Laza, Michael Leslie and Rita Sharma, 23 pages

Sequential consistency is a multiprocessor memory model of both practical and theoretical importance. The general problem of deciding whether a finite-state protocol implements sequential consistency is undecidable. In this paper, however, we show that for the protocols that arise in practice, proving sequential consistency can be done automatically in theory and can be reduced to regular language inclusion via a small amount of manual effort. In particular, we introduce an approach to construct finite-state ``observers'' that guarantee that a protocol is sequentially consistent. We have developed possible observers for several cache coherence protocols and present our experimental model checking results on a substantial directory-based cache coherence protocol. From a theoretical perspective, our work characterizes a class of protocols, which we believe encompasses all real protocols, for which sequential consistency can be decided. From a practical perspective, we are presenting a methodology for designing memory protocols such that sequential consistency may be proven automatically via model~checking.

TR-2001-05 Separating Crosscutting Concerns Across the Lifecycle:, August 08, 2001 Siobhan Clarke and Robert J. Walker, 13 pages

From Composition Patterns to AspectJ and Hyper/J distribution or persistence) present many problems for software development that manifest themselves throughout the lifecycle. Inherent properties of crosscutting requirements, such as scattering (where their support is scattered across multiple classes) and tangling (where their support is tangled with elements supporting other requirements), reduce the reusability, extensibility, and traceability of the affected software artefacts. Scattering and tangling exist both in designs and code and must therefore be addressed in both. To remove scattering and tangling properties, a means to separate the designs and code of crosscutting behaviour into independent models or programs is required. This paper discusses approaches that achieve exactly that in either designs or code, and presents an investigation into a means to maintain this separation of crosscutting behaviour seamlessly across the lifecycle. To achieve this, we work with composition patterns at the design level, AspectJ and Hyper/J at the code level, and investigate a mapping between the two levels. Composition patterns are a means to separate the design of crosscutting requirements in an encapsulated, independent, reusable, and extensible way. AspectJ and Hyper/J are technologies that provide similar levels of separation for Java code. We discuss each approach, and map the constructs from composition patterns to those of AspectJ and Hyper/J. We first illustrate composition patterns with the design of the Observer pattern, and then map that design to the appropriate code. As this is achieved with varying levels of success, the exercise also serves as a case study in using those implementation techniques.

TR-2001-06 Aspect-Oriented Incremental Customization of Middleware Services, May 28, 2001 Alex Brodsky, Dima Brodsky, Ida Chan, Yvonne Coady, Jody Pomkoski and Gregor Kiczales, 12 pages

As distributed applications evolve, incremental customization of middleware services is often required; these customizations should be unpluggable, modular, and efficient. This is difficult to achieve because the customizations depend on both application-specific needs and the services provided. Although middleware allows programmers to separate application-specific functionality from lower-level details, traditional methods of customization do not allow efficient modularization. Currently, making even minor changes to customize middleware is complicated by the lack of locality. Programmers may have to compromise between the two extremes: to interpose a simple, well-localized layer of functionality between the application and middleware, or to make a large number of small, poorly localized, invasive changes to all execution points which interact with middleware services. Although the invasive approach allows a more efficient customization, it is harder to ensure consistency, more tedious to implement, and exceedingly difficult to unplug. Thus, a common approach is to add an extra layer for systemic concerns such as robustness, caching, filtering, and security. Aspect-oriented programming (AOP) offers a potential alternative between the interposition and invasive approaches by providing modular support for the implementation of crosscutting concerns. AOP enables the implementation of efficient customizations in a structured and unpluggable manner. We demonstrate this approach by comparing traditional and AOP customizations of fault tolerance in a distributed file system model, JNFS. Our results show that using AOP can reduce the amount of invasive code to almost zero, improve efficiency by leveraging the existing application behaviour, and facilitate incremental customization and extension of middleware services.

TR-2001-07 Using Versioning to Simplify the Implementation of a Highly-Available File System, January 23, 2001 Dima Brodsky, Jody Pomkoski, Mike Feely, Norm Hutchinson and Alex Brodsky, 5 pages

(Abstract not available on-line)

TR-2001-08 Image-Based Measurement of Light Sources With Correct Filtering, July 30, 2002 Wolfgang Heidrich and Michael Goesele, 9 pages

In this document we explore the theory and potential experimental setups for measuring the near field of a complex luminary. This work extends on near field photometry by taking filtering issues into account. The physical measurement setups described here have not been tested at the time of writing this document, we simply describe several possibilities here. Once actual tests have been performed, the results will be published elsewhere.

TR-2001-09 Constraint-Based Agents: A Formal Model for Agent Design, May 25, 2001 Alan K. Mackworth and Ying Zhang, 20 pages

Formal models for agent design are important for both practical and theoretical reasons. The Constraint-Based Agent (CBA) model includes a set of tools and methods for specifying, designing, simulating, building, verifying, optimizing, learning and debugging controllers for agents embedded in an active environment. The agent and the environment are modelled symmetrically as, possibly hybrid, dynamical systems in Constraint Nets. This paper is an integrated presentation of the development and application of the CBA framework, emphasizing the important special case where the agent is an online constraint-satisfying device. Using formal modeling and specification, it is often possible to verify complex agents as obeying real-time temporal constraint specifications and, sometimes, to synthesize controllers automatically. In this paper, we take an engineering point of view, using requirements specification and system verification as measurement tools for intelligent systems. Since most intelligent systems are real-time dynamic systems, the requirements specification must be able to represent timed properties. We have developed timed \$\forall\$-automata for this purpose. We present this formal specification, examples of specifying requirements and a general procedure for verification. The CBA model demonstrates the power of viewing constraint programming as the creation of online constraint-solvers for dynamic constraints.

TR-2001-10 The Shortest Disjunctive Normal Form of a Random Boolean Function, June 08, 2001 Nicholas Pippenger, 28 pages

The Shortest Disjunctive Normal Form of a Random Boolean Function Nicholas Pippenger This paper gives a new upper bound for the average length l(n) of the shortest disjunctive normal form for a random Boolean function of n arguments, as well as new proofs of two old results related to this quantity. We consider a random Boolean function of n arguments to be uniformly distributed over all 2^{2^n} such functions. (This is equivalent to considering each entry in the truth-table to be 0 or 1 independently and with equal probabilities.) We measure the length of a disjunctive normal form by the number of terms. (Measuring it by the number of literals would simply introduce a factor of n into all our asymptotic results.) We give a short proof using martingales of Nigmatullin's result that almost all Boolean functions have the length of their shortest disjunctive normal form asymptotic to the average length l(n). We also give a short information-theoretic proof of Kuznetsov's lower bound l(n) >= (1+o(1)) 2^n / log n log log n. (Here log denotes the logarithm to base 2.) Our main result is a new upper bound l(n) <= (1+o(1)) H(n) 2^n / log n log log n, where H(n) is a function that oscillates between 1.38826... and 1.54169.... The best previous upper bound, due to Korshunov, had a similar form, but with a function oscillating between 1.581411... and 2.621132.... The main ideas in our new bound are (1) the use of Ro"dl's "nibble" technique for solving packing and covering problems, (2) the use of correlation inequalities due to Harris and Janson to bound the effects of weakly dependent random variables, and (3) the solution of an optimization problem that determines the sizes of "nibbles" and larger "bites" to be taken at various stages of the construction.

TR-2001-11 Characterizations of Random Set-Walks, August 1, 2001 Joseph H. T. Wong, 58 pages

In this thesis, we introduce a new class of set-valued random processes called random set-walk, which is an extension of the classical random walk that takes into account both the nonhomogeneity of the walk's environment, and the additional factor of nondeterminism in the choices of such environments. We also lay down the basic framework for studying random set-walks. We define the notion of a characteristic tuple as a 4-tuple of first-exit probabilities which characterizes the behaviour of a random walk in a nonhomogeneous environment, and a characteristic tuple set as its analogue for a random set-walk. We prove several properties of random set-walks and characteristic tuples, from which we derive our main result: the long-run behaviour of a sequence of random set-walks, relative to the endpoints of the walks, converges as the length of the walks tend to infinity.

TR-2001-12 Enumeration of Matchings in the Incidence Graphs of Complete and Complete Bipartite Graphs, September 10, 2001 Nicholas Pippenger, 23 pages

Enumeration of Matchings in the Incidence Graphs of Complete and Complete Bipartite Graphs Nicholas Pippenger If G = (V, E) is a graph, the incidence graph I(G) is the graph with vertices the union of V and E and an edge joining v in V and e in E when and only when v is incident with e in G. For G equal to K_n (the complete graph on n vertices) or K_{n,n} (the complete bipartite graph on n + n vertices), we enumerate the matchings (sets of edges, no two having a vertex in common) in I(G), both exactly (in terms of generating functions) and asymptotically. We also enumerate the equivalence classes of matchings (where two matchings are considered equivalent if there is an automorphism of G that induces an automorphism of I(G) that takes one to the other).

TR-2001-13 Concern Graphs: Finding and Describing Concerns Using Structural Program Dependencies, September 10, 2001 Martin P. Robillard and Gail C. Murphy, 11 pages

Many maintenance tasks address concerns, or features, that are not well modularized in the source code comprising a system. Existing approaches available to help software developers locate and manage scattered concerns use a representation based on lines of source code, complicating the analysis of the concerns. In this paper, we introduce the Concern Graph representation that abstracts the implementation details of a concern and makes explicit the relationships between different parts of the concern. The abstraction used in a Concern Graph has been designed to allow an obvious and inexpensive mapping back to the corresponding source code. To investigate the practical tradeoffs related to this approach, we have built the Feature Exploration and Analysis tool (FEAT) that allows a developer to manipulate a concern representation extracted from a Java system, and to analyze the relationships of that concern to the code base. We have used this tool to find and describe concerns related to software change tasks. We have performed case studies to evaluate the feasibility, usability, and scalability of the approach. Our results indicate that Concern Graphs can be used to document a concern for change, that developers unfamiliar with Concern Graphs can use them effectively, and that the underlying technology scales to industrial-sized programs.

TR-2001-14 Loosely Coupled Optimistic Replication for Highly Available, Scalable Storage, September 13, 2001 Dima Brodsky, Jody Pomkoski, Michael J. Feeley, Norman Hutchinson and Alex Brodsky, 12 pages

People are becoming increasingly reliant on computing devices and are trusting increasingly important data to persistent storage. These systems should protect this data from failure and ensure that it is available anytime, from anywhere. Unfortunately, traditional mechanisms for ensuring high availability suffer from the complexity of maintaining consistent, distributed replicas of data. This paper describes Mammoth, a novel file system that uses a loosely-connected set of nodes to replicate data and maintain consistency. The key idea of Mammoth is that files and directories are stored as histories of immutable versions and that all meta-data is stored in append-only change logs. Users specify availability policies for their files and the system uses these policies to replicate certain, but not necessarily all, versions to remote nodes to protect them from a variety of failures. Because file data is immutable, it can be freely replicated without complicating the file's consistency. File and directory meta-data is replicated using an optimistic policy that allows partitioned nodes to read and write whatever file versions are currently accessible. When network partitions heal, inconsistent meta-data is reconciled by merging the meta-data updates made in each partition; conflicting updates manifest as branches in the file's or directory's history and can thus can be further resolved by higher-level software or users. We describe our design and the implementation and performance of an early prototype.

TR-2001-15 Bayesian Latent Semantic Analysis of Multimedia Databases, October 11, 2001 Nando de Freitas and Kobus Barnard, 35 pages

We present a Bayesian mixture model for probabilistic latent semantic analysis of documents with images and text. The Bayesian perspective allows us to perform automatic regularisation to obtain sparser and more coherent clustering models. It also enables us to encode a priori knowledge, such as word and image preferences. The learnt model can be used for browsing digital databases, information retrieval with image and/or text queries, image annotation (adding words to an image) and text illustration (adding images to a text).

TR-2001-17 Clustering Facial Displays in Context, November 13, 2001 Jesse Hoey, 16 pages

A computer user's facial displays will be context dependent, especially in the presence of an embodied agent. Furthermore, each interactant will use their face in different ways, for different purposes. These two hypotheses motivate a method for clustering patterns of motion in the human face. Facial motion is described using optical flow over the entire face, projected to the complete orthogonal basis of Zernike polynomials. A context-dependent mixture of hidden Markov models (cmHMM) clusters the resulting temporal sequences of feature vectors into facial display classes. We apply the clustering technique to sequences of continuous video, in which a single face is tracked and spatially segmented. We discuss the classes of patterns uncovered for a number of subjects.

TR-2001-18 The Optimized Segment Support Map for the Mining of Frequent Patterns, November 15, 2001 Carson Kai-Sang Leung, Raymond T. Ng and Heikki Mannila, 25 pages

Computing the frequency of a pattern is a key operation in data mining algorithms. We describe a simple, yet powerful, way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is based on a simple observation about data: Real life data sets are not necessarily be uniformly distributed. The OSSM is a light-weight structure that partitions the collection of transactions into m segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (i) What is the optimal value of m, the number of segments to be used (the segment minimization problem)? (ii) Given a user-determined m, what is the best segmentation/composition of the m segments (the constrained segmentation problem)? For the segment minimization problem, we provide a thorough analysis and a theorem establishing the minimum value of m for which there is no accuracy lost in using the OSSM. For the constrained segmentation problem, we develop various algorithms and heuristics to help facilitate segmentation. Our experimental results on both real and synthetic data sets show that our segmentation algorithms and heuristics can efficiently generate OSSMs that are compact and effective.

TR-2001-19 Animation of Fish Swimming, January 30, 2002 William F. Gates, 8 pages

We present a simple, two-part model of the locomotion of slender-bodied aquatic animals designed specifically for the needs of computer animation. The first part of the model is kinematic and addresses body deformations for three swimming modes: steady swimming, rapid starting, and turning. The second part of the model is dynamic and addresses the resulting propulsion of the aquatic animal. While this approach is not as general as a fully dynamic model, it provides the animator with a small set of intuitive parameters that directly control how the fish model moves and is more efficient to simulate.

TR-2001-20 Free-Surface Conditions in the Realistic Animation of Liquids, January 30, 2002 William F. Gates, 13 pages

The realistic animation of liquids based on the dynamic simulation of free-surface flow requires appropriate conditions on the liquid-gas interface. These conditions can be painstaking to implement and are in general not unique. We present the conditions we use in our implementation of a fluid animation system and discuss our rationale behind them.

TR-2001-22 Controlling Fluid Flow Simulation, January 30, 2002 William F. Gates and Alain Fournier, 3 pages

Simulating fluid dynamics can be a powerful approach to animating liquids and gases, but it is often difficult to ``direct'' the simulation to ``perform'' as desired. We introduce a simple yet powerful technique of controlling incompressible flow simulation for computer animation purposes that works for any simulation method using a projection scheme for numerically solving the Navier-Stokes equations. In our technique, an abstract vector field representing the desired influence over the simulated flow is modelled using simple primitives. This technique allows an arbitrarily degree of control over the simulated flow at every point while still conserving mass, momentum, and energy.