CS Theses & Dissertations 2009

For 2009 graduation dates (in alphabetical order by last name):

Model-based clustering for aCGH data using variational EM
Alain, Guillaume
DOI : 10.14288/1.0051536
URI : http://hdl.handle.net/2429/11992
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. K. Murphy

Computational approaches for RNA energy parameter estimation
Andronescu, Mirela Stefania
DOI : 10.14288/1.0051272
URI : http://hdl.handle.net/2429/2794
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-05
Supervisors : Dr. Condon, Dr. Hoos

RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature. The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model. We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement. We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models.

Bounded-curvature motion planning amid polygonal obstacles
Backer, Jonathan
DOI : 10.14288/1.0051257
URI : http://hdl.handle.net/2429/5153
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-05
Supervisor : Dr. Kirkpatrick

We consider the problem of finding a bounded-curvature path in the plane from one configuration αs to another configuration αt that avoids the interior of a set of polygonal obstacles Ε. We call any such path from αs to αt a feasible path. In this thesis, we develop algorithms to find feasible paths that have explicit guarantees on when they will return a feasible path. We phrase our guarantees and run time analysis in terms of the complexity of the desired solution (see k and λ below). In a sense, our algorithms are output sensitive, which is particularly desirable because there are no known bounds on the solution complexity amid arbitrary polygonal environments. Our first major result is an algorithm that given Ε, αs, αt, and a positive integer k either (i) verifies that every feasible path has a descriptive complexity greater than k or (ii) outputs a feasible path. The run time of this algorithm is bounded by a polynomial in n (the total number of obstacle vertices in Ε), m (the bit precision of the input), and k. This result complements earlier work by Fortune and Wilfong: their algorithm considers paths of arbitrary descriptive complexity (it has no dependence on k), but it never outputs a path, just whether or not a feasible path exists. Our second major result is an algorithm that given E, αs, αt, a length λ, and an approximation factor Ε, either (i) verifies that every feasible path has length greater than λ or (ii) constructs a feasible path that is at most (1+ Ε) times longer than the shortest feasible path. The run time of this algorithm is bounded by a polynomial in n, m, Ε-1, and λ. This algorithm is the result of applying the techniques developed earlier in our thesis to the previous approximation approaches. A shortcoming of these prior approximation algorithms is that they only search a special class of feasible paths. This restriction implies that the path that they return may be arbitrarily longer than the shortest path. Our algorithm returns a true approximation because we search for arbitrary shortest paths.

Path planning for improved target visibility : maintaining line of sight in a cluttered environment
Baumann, Matthew Alexander
DOI : 10.14288/1.0051416
URI : http://hdl.handle.net/2429/4481
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Little

New probabilistic inference algorithms that harness the strengths of variational and Monte Carlo methods
Carbonetto, Peter Salvatore
DOI : 10.14288/1.0051537
URI : http://hdl.handle.net/2429/11990
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-11
Supervisor : Dr. de Freitas

The central objective of this thesis is to develop new algorithms for inference in probabilistic graphical models that improve upon the state-of-the-art and lend new insight into the computational nature of probabilistic inference. The four main technical contributions of this thesis are: 1) a new framework for inference in probabilistic models based on stochastic approximation, variational methods and sequential Monte Carlo is proposed that achieves significant improvements in accuracy and reductions in variance over existing Monte Carlo and variational methods, and at a comparable computational expense, 2) for many instances of the proposed approach to probabilistic inference, constraints must be imposed on the parameters, so I describe a new stochastic approximation algorithm that adopts the methodology of primal-dual interior-point methods and handles constrained optimization problems much more robustly than existing approaches, 3) a new class of conditionally-specified variational approximations based on mean field theory is described, which, when combined with sequential Monte Carlo, overcome some of the limitations imposed by conventional variational mean field approximations, and 4) I show how recent advances in variational inference can be used to implement inference and learning in a novel contingently acyclic probabilistic relational model, a model developed for the purpose of making predictions about relationships in a social network.

A framework for the lightweight augmentation of webcast archives
Chan, Clarence Bradley
DOI : 10.14288/1.0051467
URI : http://hdl.handle.net/2429/5790
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Booth

Homography Estimation
Dubrofsky, Elan Nathan
Master's essay available online :

Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Woodham, Dr. Little

Design of haptic signals for information communication in everyday environments
Enriquez, Mario Javier
DOI : 10.14288/1.0051354
URI : http://hdl.handle.net/2429/2740
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-05
Supervisor : Dr. MacLean

Multi-function interfaces have become increasingly pervasive and are frequently used in contexts which pose multiple demands on a single sensory modality. Assuming some degree of modularity in attentional processing and that using a different sensory channel for communication can reduce interference with critical visual tasks, one possibility is to divert some information through the touch sense. The goal of this Thesis is to advance our knowledge of relevant human capabilities and embed this knowledge into haptic communication design tools and procedures, in the interest of creating haptically supported interfaces that decrease rather than add to their users’ sensory and cognitive load. In short, we wanted to create tools and methods that would allow the creation of haptic signals (accomplished via display of either forces or vibrations) extending beyond the one bit of communication offered by current pagers and cellular phone buzzers. In our quest to create information-rich haptic signals we need to learn how to create signals that are differentiable. We also need to study ways to assign meanings to these signals and make sure that they can be perceived clearly when presented one after another even in environments where their recipient might be involved with other tasks. These needs frame the specific research goals of this thesis. Most of the results described here were obtained through the study of tactile (in the skin) rather than proprioceptive (force feedback) stimuli. We begin by presenting several methods to create, validate and contrast tactile stimulus dissimilarity data and investigate the design of a waveform intended to be a tactile perceptual intermediate between a square waveform and a triangle waveform. Next, we explore methods to create and test tactile signal-meaning associations and document a surprising ability of participants to exhibit high recall of quickly learned associations at two weeks in a first examination of longitudinal recall of tactile stimuli. We then present methods to measure tactile stimulus masking and identify crucial perceptual thresholds relating to stimulus temporal spacing in an exploration into the masking effects of common-onset vibrotactile stimuli. Finally, we present methods to test haptic and multimodal perception in simulated scenarios including a method to simulate and control cognitive workload; and provide evidence that the commonly-used device of multimodal signal reinforcement can adversely impact performance in an ongoing primary task. The research presented in this Thesis has implications for the design of signals to be used in displays that are emerging in embedded computing environments such as cars, games, cellular phones, and medical devices.

Supporting feature awareness and improving performance with personalized graphical user interfaces
Findlater, Leah
DOI : 10.14288/1.0051322
URI : http://hdl.handle.net/2429/11622
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-11
Supervisor : Dr. McGrenere

Personalized graphical user interfaces have the potential to reduce visual complexity and improve efficiency by modifying the interface to better suit an individual user's needs. Working in a personalized interface can make users faster, more accurate and more satisfied; in practice, however, personalization also comes with costs, such as a reliance on user effort to control the personalization, or the introduction of spatial instability when interface items are reorganized automatically. We conducted a series of studies to examine both the costs and benefits of personalization, and to identify techniques and contexts that would be the most likely to provide an overall benefit. We first interviewed long-term users of a software application that provides adaptable (user-controlled) personalization. A design trade-off that emerged is that while personalization can increase the accessibility of features useful to a user's current task, it may in turn negatively impact the user's awareness of the full set of available features. To assess this potential trade-off, we introduced awareness as an evaluation metric to be used alongside more standard performance measures and we ran a series of three studies to understand how awareness relates to core task performance. These studies used two different measures to assess awareness, showing that personalization can impact both the recognition rate of unused features in the interface and user performance on new tasks requiring those features. We investigated both adaptive (system-controlled) and adaptable personalization techniques to help us understand the generalizability of the awareness concept. In addition to introducing and incorporating awareness into our evaluations, we studied how specific contextual and design characteristics impact the user's experience with adaptive interfaces. In one study, we evaluated the impact of screen size on performance and user satisfaction with adaptive split menus. Results showed that the performance and satisfaction benefits of spatially reorganizing items in the interface are more likely to outweigh the costs when screen size is small. We also introduced a new adaptive personalization technique that maintains spatial stability, called ephemeral adaptation, and evaluated it through two studies. Ephemeral adaptation improves performance over both another closely related adaptive technique and a traditional interface.

Particle Markov chain Monte Carlo
Holenstein, Roman
DOI : 10.14288/1.0051665
URI : http://hdl.handle.net/2429/7319
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-05
Supervisor : Dr. Doucet

Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) methods have emerged as the two main tools to sample from high-dimensional probability distributions. Although asymptotic convergence of MCMC algorithms is ensured under weak assumptions, the performance of these latters is unreliable when the proposal distributions used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. In this thesis we propose a new Monte Carlo framework in which we build efficient high-dimensional proposal distributions using SMC methods. This allows us to design effective MCMC algorithms in complex scenarios where standard strategies fail. We demonstrate these algorithms on a number of example problems, including simulated tempering, nonlinear non-Gaussian state-space model, and protein folding.

Automated configuration of algorithms for solving hard computational problems
Hutter, Frank Roman
DOI : 10.14288/1.0051652
URI : http://hdl.handle.net/2429/13907
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-11
Supervisor : Dr. Hoos, Dr. K. Murphy

The best-performing algorithms for many hard problems are highly parameterized. Selecting the best heuristics and tuning their parameters for optimal overall performance is often a difficult, tedious, and unsatisfying task. This thesis studies the automation of this important part of algorithm design: the configuration of discrete algorithm components and their continuous parameters to construct an algorithm with desirable empirical performance characteristics. Automated configuration procedures can facilitate algorithm development and be applied on the end user side to optimize performance for new instance types and optimization objectives. The use of such procedures separates high-level cognitive tasks carried out by humans from tedious low-level tasks that can be left to machines. We introduce two alternative algorithm configuration frameworks: iterated local search in parameter configuration space and sequential optimization based on response surface models. To the best of our knowledge, our local search approach is the first that goes beyond local optima. Our model-based search techniques significantly outperform existing techniques and extend them in ways crucial for general algorithm configuration: they can handle categorical parameters, optimization objectives defined across multiple instances, and tens of thousands of data points. We study how many runs to perform for evaluating a parameter configuration and how to set the cutoff time, after which algorithm runs are terminated unsuccessfully. We introduce data-driven approaches for making these choices adaptively, most notably the first general method for adaptively setting the cutoff time. Using our procedures—to the best of our knowledge still the only ones applicable to these complex configuration tasks—we configured state-of-the-art tree search and local search algorithms for SAT, as well as CPLEX, the most widely-used commercial optimization tool for solving mixed integer programs (MIP). In many cases, we achieved improvements of orders of magnitude over the algorithm default, thereby substantially improving the state of the art in solving a broad range of problems, including industrially relevant instances of SAT and MIP. Based on these results, we believe that automated algorithm configuration procedures, such as ours, will play an increasingly crucial role in the design of high-performance algorithms and will be widely used in academia and industry.

The second eigenvalue and random walks in random regular graphs with increasing girth
Izsak, Alexander
DOI : 10.14288/1.0051678
URI : http://hdl.handle.net/2429/12649
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Friedman

An effective solution for bluetooth adhoc networking
Jia, Sijun
DOI : 10.14288/1.0051480
URI : http://hdl.handle.net/2429/7238
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Vuong

RadFS - virtualizing filesystems
Karollil, Anoop
DOI : 10.14288/1.0051466
URI : http://hdl.handle.net/2429/5749
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisors : Dr. Feeley, Dr. Warfield

On uniform sampling of cliques
Khorvash, Massih
10.14288/1.0051700
URI : http://hdl.handle.net/2429/13923
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisors : Dr. Kirkpatrick, Dr. Hoos

SATenstein : automatically building local search SAT solvers from components
KhudaBukhsh, Ashiqur Rahman
DOI : 10.14288/1.0051500
URI : http://hdl.handle.net/2429/13852
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisors : Dr. Leyton-Brown, Dr. Hoos

HMM converter a tool box for hidden Markov models with two novel, memory efficient parameter training algorithms
Lam, Tin Yin
DOI : 10.14288/1.0051281
URI : http://hdl.handle.net/2429/5786
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Meyer

A paradigm for classroom presentations on large, high-resolution displays
Lanir, Yoel
DOI : 10.14288/1.0051452
URI : http://hdl.handle.net/2429/13468
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-11
Supervisor : Dr. Booth

Large classrooms traditionally provided multiple blackboards on which entire lectures were visible. In recent decades, classrooms were augmented with data projectors, allowing computer-generated slides to replace hand-written blackboard notes. Many lecture halls and conference rooms are now equipped with multiple projectors that provide large, high-resolution displays of sizes comparable to old-fashioned blackboard arrays. The predominant presentation software, however, is still designed for a single projector. Our research was to understand how software tools for classroom presentations can bridge the gap between traditional and newer technology to take full advantage of increased screen resolution and real estate to benefit both instructors and students. As a first step, we conducted observational studies to see how blackboards and computer slides were used in classroom and conference settings. We developed a classification of usage for traditional and electronic visual aids. We then introduced six guidelines for designers of presentation tools to support learning. Based on the guidelines, we developed MultiPresenter – a novel presentation system designed to work on large display surfaces with multiple or high-resolution displays. MultiPresenter allows instructors to organize and present pre-made and dynamic presentations using a personal laptop. Instructors can employ the extra screen real estate to provide short- or long-term persistency of information. MultiPresenter was used by eight instructors in fifteen classes over five consecutive semesters. We collected usage data and iteratively refined the software. Based on our data, we identified a set of pedagogical practices for using extra space in electronic presentations and we revised our design guidelines. This led to a conceptual framework for effectively using large display surfaces in classrooms. To empirically examine the effect on students‘ learning of increasing screen real estate, we conducted a controlled laboratory study. Results indicated that, when used properly, a lecture shown on multiple screens can improve learning over a regular singlescreen lecture.

Protecting xen hypercalls : intrusion detection/ prevention in a virtualization environment
Le, Cuong Hoang H.
DOI : 10.14288/1.0051653
URI : http://hdl.handle.net/2429/14849
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Hutchinson, Dr. Aiello

Whale Tank Virtual Reality
Maksakov, Evgeny Vladimirovich
DOI : 10.14288/1.0051200
URI : http://hdl.handle.net/2429/12925
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Booth

Interpreter implementation of advice weaving
Naseer, Muhammad Immad
DOI : 10.14288/1.0051259
URI : http://hdl.handle.net/2429/7384
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Kiczales

Computing degree-of-knowledge values for a developer's workspace
Ou, Jingwen
DOI : 10.14288/1.0051655
URI : http://hdl.handle.net/2429/15227
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. G. Murphy

The Design of a Tool for Teaching Hierarchical Control for Robot Navigation
Ren, Hao
Master's essay available online :

Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Poole

A multimedia interface for facilitating comparisons of opinions
Rizoli, Lucas
DOI : 10.14288/1.0051680
URI : http://hdl.handle.net/2429/12616
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Carenini

Garment modeling from fashion drawings and sketches
Robson, Cody John
DOI : 10.14288/1.0051713
URI : http://hdl.handle.net/2429/13396
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Sheffer

Single exposure high dynamic range imaging with a conventional camera using cross-screen filters
Rouf, Mushfiqur
DOI : 10.14288/1.0051847
URI : http://hdl.handle.net/2429/24121
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Heidrich

Model based approaches to array CGH data analysis
Shah, Sohrab P.
DOI : 10.14288/1.0051341
URI : http://hdl.handle.net/2429/2808
Degree : Doctor of Philosophy - PhD
Graduation Date : 2009-05
Supervisors : Dr. Ng, Dr. K. Murphy

DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups.

Paceline : toward high-bandwidth interactive continous media applications over TCP
Tayarani Najaran, Mahdi
DOI : 10.14288/1.0051679
URI : http://hdl.handle.net/2429/12570
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Krasic

An active help system to improve program navigation
Viriyakattiyaporn, Petcharat
DOI : 10.14288/1.0051654
URI : http://hdl.handle.net/2429/14846
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. G. Murphy

Privacy-preserving publishing of moving objects databases
Yarovoy, Roman
DOI : 10.14288/1.0051481
URI : http://hdl.handle.net/2429/7154
Degree : Master of Science - MSc
Graduation Date : 2009-05
Supervisor : Dr. Lakshmanan

Abstraction of man-made shapes
Zhou, Qingnan (James)
DOI : 10.14288/1.0051201
URI : http://hdl.handle.net/2429/12633
Degree : Master of Science - MSc
Graduation Date : 2009-11
Supervisor : Dr. Sheffer