CS Theses & Dissertations 2008

For 2008 graduation dates (in alphabetical order by last name):

On efficient recommendations for online exchange markets
Abbassi, Zeinab
DOI : 10.14288/1.0051430
URI : http://hdl.handle.net/2429/3961
Degree : Master of Science - MSc
Graduation Date : 2008-11

Metadata services for the Parallax storage system
Aggarwal, Gitika
DOI : 10.14288/1.0051577
URI : http://hdl.handle.net/2429/5586
Degree : Master of Science - MSc
Graduation Date : 2008-11

Assisting bug report triage through recommendation
Anvik, John
DOI : 10.14288/1.0051337
URI : http://hdl.handle.net/2429/265
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-05

A key collaborative hub for many software development projects is the issue tracking system, or bug repository. The use of a bug repository can improve the software development process in a number of ways including allowing developers who are geographically distributed to communicate about project development. However, reports added to the repository need to be triaged by a human, called the triager, to determine if reports are meaningful. If a report is meaningful, the triager decides how to organize the report for integration into the project's development process. We call triager decisions with the goal of determining if a report is meaningful, repository-oriented decisions, and triager decisions that organize reports for the development process, development-oriented decisions. Triagers can become overwhelmed by the number of reports added to the repository. Time spent triaging also typically diverts valuable resources away from the improvement of the product to the managing of the development process. To assist triagers, this dissertation presents a machine learning approach to create recommenders that assist with a variety of development-oriented decisions. In this way, we strive to reduce human involvement in triage by moving the triager's role from having to gather information to make a decision to that of confirming a suggestion. This dissertation introduces a triage-assisting recommender creation process that can create a variety of different development-oriented decision recommenders for a range of projects. The recommenders created with this approach are accurate: recommenders for which developer to assign a report have a precision of 70% to 98% over five open source projects, recommenders for which product component the report is for have a recall of 72% to 92%, and recommenders for who to add to the cc: list of a report that have a recall of 46% to 72%. We have evaluated recommenders created with our triage-assisting recommender creation process using both an analytic evaluation and a field study. In addition, we present in this dissertation an approach to assist project members to specify the project-specific values for the triage-assisting recommender creation process, and show that such recommenders can be created with a subset of the repository data.

Feature-based graph visualization
Archambault, Daniel William
DOI : 10.14288/1.0051247
URI : http://hdl.handle.net/2429/2839
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

A graph consists of a set and a binary relation on that set. Each element of the set is a node of the graph, while each element of the binary relation is an edge of the graph that encodes a relationship between two nodes. Graph are pervasive in many areas of science, engineering, and the social sciences: servers on the Internet are connected, proteins interact in large biological systems, social networks encode the relationships between people, and functions call each other in a program. In these domains, the graphs can become very large, consisting of hundreds of thousands of nodes and millions of edges. Graph drawing approaches endeavour to place these nodes in two or three-dimensional space with the intention of fostering an understanding of the binary relation by a human being examining the image. However, many of these approaches to drawing do not exploit higher-level structures in the graph beyond the nodes and edges. Frequently, these structures can be exploited for drawing. As an example, consider a large computer network where nodes are servers and edges are connections between those servers. If a user would like understand how servers at UBC connect to the rest of the network, a drawing that accentuates the set of nodes representing those servers may be more helpful than an approach where all nodes are drawn in the same way. In a feature-based approach, features are subgraphs exploited for the purposes of drawing. We endeavour to depict not only the binary relation, but the high-level relationships between features. This thesis extensively explores a feature-based approach to graph vi sualization and demonstrates the viability of tools that aid in the visual ization of large graphs. Our contributions lie in presenting and evaluating novel techniques and algorithms for graph visualization. We implement five systems in order to empirically evaluate these techniques and algorithms, comparing them to previous approaches.

Cerebral : visualizing multiple experimental conditions on a graph with biological context
Barsky, Aaron
DOI : 10.14288/1.0051296
URI : http://hdl.handle.net/2429/2857
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

Systems biologists use interaction graphs to model the behaviour of biological systems at the molecular level. In an iterative process, such biologists observe the reactions of living cells under various experimental conditions, view the results in the context of the interaction graph, and then propose changes to the graph model. These graphs represent dynamic knowledge of the biological system being studied and evolve as new insight is gained from the experimental data. While numerous graph layout and drawing packages are available, these tools did not fully meet the needs of our immunologist collaborators. In this thesis, we describe the data display needs of these immunologists and translate these needs into visual encoding decisions. These decisions led us to create Cerebral, a system that uses a biologically guided graph layout and incorporates experimental data directly into the graph display. Our graph layout algorithm uses simulated annealing with constraints, optimized with a uniform grid to have an expected runtime of o(E/V). Small multiple views of different experimental conditions and a measurement-driven parallel coordinates view enable correlations between experimental conditions to be analyzed at the same time that the measurements are viewed in the graph context. This combination of coordinated views allows the biologist to view the data from many different perspectives simultaneously. To illustrate the typical analysis tasks performed, we analyze two datasets using Cerebral. Based on feedback from our collaborators, we conclude that Cerebral is a valuable tool for analyzing experimental data in the context of an interaction graph model.

PVIT: A task-based approach for design and evaluation of interactive visualizations for preferential choice
Bautista, Jeanette Lyn
DOI : 10.14288/1.0051291
URI : http://hdl.handle.net/2429/740
Degree : Master of Science - MSc
Graduation Date : 2008-05

A common model for ubiquitous computing
Blackstock, Michael Anthony
DOI : 10.14288/1.0051355
URI : http://hdl.handle.net/2429/2478
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

Ubiquitous computing (ubicomp) is a compelling vision for how people will interact with multiple computer systems in the course of their daily lives. To date, practitioners have created a variety of infrastructures, middleware and toolkits to provide the flexibility, ease of programming and the necessary coordination of distributed software and hardware components in physical spaces. However, to-date no one approach has been adopted as a default or de-facto standard. Consequently the field risks losing momentum as fragmentation occurs. In particular, the goal of ubiquitous deployments may stall as groups deploy and trial incompatible point solutions in specific locations. In their defense, researchers in the field argue that it is too early to standardize and that room is needed to explore specialized domain-specific solutions. In the absence of an agreed upon set of standards, we argue that the community must consider a methodology that allows systems to evolve and specialize, while at the same time allowing the development of portable applications and integrated deployments that work between between sites. To address this we studied the programming models of many commercial and research ubicomp systems. Through this survey we gained an understanding of the shared abstractions required in a core programming model suitable for both application portability and systems integration. Based on this study we designed an extensible core model called the Ubicomp Common Model (UCM) to describe a representative sample of ubiquitous systems to date. The UCM is instantiated in a flexible and extensible platform called the Ubicomp Integration Framework (UIF) to adapt ubicomp systems to this model. Through application development and integration experience with a composite campus environment, we provide strong evidence that this model is adequate for application development and that the complexity of developing adapters to several representative systems is not onerous. The performance overhead introduced by introducing the centralized UIF between applications and an integrated system is reasonable. Through careful analysis and the use of well understood approaches to integration, this thesis demonstrates the value of our methodology that directly leverages the significant contributions of past research in our quest for ubicomp application and systems interoperability.

Ontology alignment in the presence of a domain ontology : finding protein homology
Carbonetto, Andrew August
DOI : 10.14288/1.0051437
URI : http://hdl.handle.net/2429/821
Degree : Master of Science - MSc
Graduation Date : 2008-11

Semi-automatic fitting of deformable 3D models to 2D sketches
Chang, Xianglong
DOI : 10.14288/1.0051370
URI : http://hdl.handle.net/2429/797
Degree : Master of Science - MSc
Graduation Date : 2008-11

Reducing remodularization complexity through modular-objective decoupling
Chern, Rick
DOI : 10.14288/1.0051353
URI : http://hdl.handle.net/2429/1380
Degree : Master of Science - MSc
Graduation Date : 2008-11

SUE : an advertisement recommendation framework utilizing categorized events and stimuli
Cheung, Billy Chi Hong
DOI : 10.14288/1.0051206
URI : http://hdl.handle.net/2429/2450
Degree : Master of Science - MSc
Graduation Date : 2008-05

MAIDS for VoIP : a Mobile Agents-based Intrusion Detection System for Voice over Internet Protocol
Chita, Christian
DOI : 10.14288/1.0051579
URI : http://hdl.handle.net/2429/5599
Degree : Master of Science - MSc
Graduation Date : 2008-11

A hybrid P2P pre-release distribution framework for flash crowd avoidance in P2P video on demand streaming
Chiu, Stanley Kai Him
DOI : 10.14288/1.0051225
URI : http://hdl.handle.net/2429/4067
Degree : Master of Science - MSc
Graduation Date : 2008-11

Model-based active learning in hierarchical policies
Cora, Vlad M.
DOI : 10.14288/1.0051276
URI : http://hdl.handle.net/2429/737
Degree : Master of Science - MSc
Graduation Date : 2008-05

Automatic juxtaposition of source files
Davis, Samuel
DOI : 10.14288/1.0051401
URI : http://hdl.handle.net/2429/1607
Degree : Master of Science - MSc
Graduation Date : 2008-11

Supporting conceptual queries over integrated sources of program information
De Alwis, Brian
DOI : 10.14288/1.0051181
URI : http://hdl.handle.net/2429/695
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-05

A software developer explores a software system by asking and answering a series of questions. To answer these questions, a developer may need to consult various sources providing information about the program, such as the static relationships expressed directly in the source code, the run-time behaviour of a program recorded in a dynamic trace, or evolution history as recorded in a source management system. Despite the support afforded by software exploration tools, developers often struggle to find the necessary information to answer their questions and may even become disoriented, where they feel mentally lost and are uncertain of what they were trying to accomplish. This dissertation advances a thesis that a developer's questions, which we refer to as conceptual queries, can be better supported through a model to represent and compose different sources of information about a program. The basis of this model is the sphere, which serves as a simple abstraction of a source of information about a program. Many of the software exploration tools used by a developer can be represented as a sphere. Spheres can be composed in a principled fashion such that information from a sphere may replace or supplement information from a different sphere. Using our sphere model, for example, a developer can use dynamic runtime information from an execution trace to replace information from the static source code to see what actually occurred. We have implemented this model in a configurable tool, called Ferret. We have used the facilities provided by the model to implement 36 conceptual queries identified from the literature, blogs, and our own experience, and to support the integration of four different sources of program information. Establishing correspondences between similar elements from different spheres allows a query to bridge across different spheres in addition to allowing a tool's user interface to drive queries from other sources of information. Through this effort we show that sphere model broadens the set of possible conceptual queries answerable by software exploration tools. Through a small diary study and a controlled experiment, both involving professional software developers, we found the developers used the conceptual queries that were available to them and reported finding Ferret useful.

Exploiting structure for scalable software verification
Domagoj, Babić
DOI : 10.14288/1.0051356
URI : http://hdl.handle.net/2429/1502
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

Software bugs are expensive. Recent estimates by the US National Institute of Standards and Technology claim that the cost of software bugs to the US economy alone is approximately 60 billion USD annually. As society becomes increasingly software-dependent, bugs also reduce our productivity and threaten our safety and security. Decreasing these direct and indirect costs represents a significant research challenge as well as an opportunity for businesses. Automatic software bug-finding and verification tools have a potential to completely revolutionize the software engineering industry by improving reliability and decreasing development costs. Since software analysis is in general undecidable, automatic tools have to use various abstractions to make the analysis computationally tractable. Abstraction is a double-edged sword: coarse abstractions, in general, yield easier verification, but also less precise results. This thesis focuses on exploiting the structure of software for abstracting away irrelevant behavior. Programmers tend to organize code into objects and functions, which effectively represent natural abstraction boundaries. Humans use such structural abstractions to simplify their mental models of software and for constructing informal explanations of why a piece of code should work. A natural question to ask is: How can automatic bug-finding tools exploit the same natural abstractions? This thesis offers possible answers. More specifically, I present three novel ways to exploit structure at three different steps of the software analysis process. First, I show how symbolic execution can preserve the data-flow dependencies of the original code while constructing compact symbolic representations of programs. Second, I propose structural abstraction, which exploits the structure preserved by the symbolic execution. Structural abstraction solves a long-standing open problem --- scalable interprocedural path- and context-sensitive program analysis. Finally, I present an automatic tuning approach that exploits the fine-grained structural properties of software (namely, data- and control-dependency) for faster property checking. This novel approach resulted in a 500-fold speedup over the best previous techniques. Automatic tuning not only redefined the limits of automatic software analysis tools, but also has already found its way into other domains (like model checking), demonstrating the generality and applicability of this idea.

Presentation techniques for more expressive programs
Eisenberg, Andrew David
DOI : 10.14288/1.0051292
URI : http://hdl.handle.net/2429/900
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

We introduce a class of program editors that present a program using a rich set of transformations; we call these kinds of editors composable presentation editors. Proper use of these kinds of editors appears to lead to more expressive programs-programs whose structure are aligned with the problem they are trying to solve. By default, the composable presentation editor presents program elements textually as concrete syntax and enables typical editor commands on the program. Metadata on program elements control how the transformations are applied. Customized metadata can re-order, pictorialize, collapse, duplicate, or expand the displayed form of program elements and can additionally alter the available editor commands. We have developed a set of presentation techniques to be used by presentation designers (i.e., the programmers who design how a program is presented in the editor. These techniques relate to well-understood programming language design, editor design, and programming best-practices techniques including scoping, higher order functions, refactoring, prettyprinting, naming conventions, syntax highlighting, and text hovers. We introduce two implementations of composable presentation editors and a number of examples showing how programs can be made more expressive when presentation techniques are properly used. The first implementation is the ETMOP, an open editor, where a metaobject protocol is provided that allows language and editor designers to customize the way program elements are displayed. These customizations are called presenta- tion extensions and the corresponding presentation extension protocol acts in a way similar to the way that syntax macros extend the syntax of a language. The second implementation is Embedded CAL, a closed editor that uses these presentation techniques to embed one language (CAL) inside a host language (Java) through the use of presentation techniques, without changing the syntax or compiler of either language.

On the design and implementation of decision-theoretic, interactive, and vision-driven mobile robots
Elinas, Pantelis
DOI : 10.14288/1.0051411
URI : http://hdl.handle.net/2429/245
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-05

We present a framework for the design and implementation of visually-guided, interactive, mobile robots. Essential to the framework's robust performance is our behavior-based robot control architecture enhanced with a state of the art decision-theoretic planner that takes into account the temporal characteristics of robot actions and allows us to achieve principled coordination of complex subtasks implemented as robot behaviors/skills. We study two different models of the decision theoretic layer: Multiply Sectioned Markov Decision Processes (MSMDPs) under the assumption that the world state is fully observable by the agent, and Partially Observable Markov Decision Processes (POMDPs) that remove the latter assumption and allow us to model the uncertainty in sensor measurements. The MSMDP model utilizes a divide-and-conquer approach for solving problems with millions of states using concurrent actions. For solving large POMDPs, we present heuristics that improve the computational efficiency of the point-based value iteration algorithm while tackling the problem of multi-step actions using Dynamic Bayesian Networks. In addition, we describe a state-of-the-art simultaneous localization and mapping algorithm for robots equipped with stereo vision. We first present the Monte-Carlo algorithm sigmaMCL for robot localization in 3D using natural landmarks identified by their appearance in images. Secondly, we extend sigmaMCL and develop the sigmaSLAM algorithm for solving the simultaneous localization and mapping problem for visually-guided, mobile robots. We demonstrate our real-time algorithm mapping large, indoor environments in the presence of large changes in illumination, image blurring and dynamic objects. Finally, we demonstrate empirically the applicability of our framework for developing interactive, mobile robots capable of completing complex tasks with the aid of a human companion. We present an award winning robot waiter for serving hors d'oeuvres at receptions and a robot for delivering verbal messages among inhabitants of an office-like environment.

An all-at-once approach to nonnegative tensor factorizations
Flores Garrido, Marisol
DOI : 10.14288/1.0051271
URI : http://hdl.handle.net/2429/1606
Degree : Master of Science - MSc
Graduation Date : 2008-11

Usermode kernel : running the kernel in userspace in VM environments
George, Sharath
DOI : 10.14288/1.0051274
URI : http://hdl.handle.net/2429/2858
Degree : Master of Science - MSc
Graduation Date : 2008-11

Monte Carlo integration in discrete undirected probabilistic models
Hamze, Firas
DOI : 10.14288/1.0051447
URI : http://hdl.handle.net/2429/744
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-05

This thesis contains the author’s work in and contributions to the field of Monte Carlo sampling for undirected graphical models, a class of statistical model commonly used in machine learning, computer vision, and spatial statistics; the aim is to be able to use the methodology and resultant samples to estimate integrals of functions of the variables in the model. Over the course of the study, three different but related methods were proposed and have appeared as research papers. The thesis consists of an introductory chapter discussing the models considered, the problems involved, and a general outline of Monte Carlo methods. The three subsequent chapters contain versions of the published work. The second chapter, which has appeared in (Hamze and de Freitas 2004), is a presentation of new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient. Although the work discussed in Chapter 2 exhibited promise on the class of graphs to which it was suited, there are many cases where limiting the topology is quite a handicap. The work in Chapter 3 was an exploration in an alternative methodology for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm, published in (Hamze and de Freitas 2005), fits into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of intermediate distributions which get closer to the desired one. While the idea of using “tempered” proposals is known, we construct a novel sequence of target distributions where, rather than dropping a global temperature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning treeof the variables. We present experimental results on inference and estimation of the partition function for sparse and densely-connected graphs. The final contribution of this thesis, presented in Chapter 4 and also in (Hamze and de Freitas 2007), emerged from some empirical observations that were made while trying to optimize the sequence of edges to add to a graph so as to guide the population of samples to the high-probability regions of the model. Most important among these observations was that while several heuristic approaches, discussed in Chapter 1, certainly yielded improvements over edge sequences consisting of random choices, strategies based on forcing the particles to take large, biased random walks in the state-space resulted in a more efficient exploration, particularly at low temperatures. This motivated a new Monte Carlo approach to treating complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get “trapped” in cycles. We surmount this problem by modifying the sampling process to result in biased state-space paths of randomly chosen length. This alteration does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler.

On detecting and repairing inconsistent schema mappings
Ho, Terence Cheung-Fai
DOI : 10.14288/1.0051351
URI : http://hdl.handle.net/2429/4126
Degree : Master of Science - MSc
Graduation Date : 2008-11

Multilevel multidimensional scaling on the GPU
Ingram, Stephen F.
DOI : 10.14288/1.0051260
URI : http://hdl.handle.net/2429/409
Degree : Master of Science - MSc
Graduation Date : 2008-05

Interactive tools for biomechanical modeling and realistic animation
Kaufman, Andrew
DOI : 10.14288/1.0051231
URI : http://hdl.handle.net/2429/1492
Degree : Master of Science - MSc
Graduation Date : 2008-11

Bayesian cluster validation
Koepke, Hoyt Adam
DOI : 10.14288/1.0051357
URI : http://hdl.handle.net/2429/1496
Degree : Master of Science - MSc
Graduation Date : 2008-11

Shared displays to support collaborative exploration of ocean summits
Lai, Sherman
DOI : 10.14288/1.0051249
URI : http://hdl.handle.net/2429/711
Degree : Master of Science - MSc
Graduation Date : 2008-05

Visual exploratory analysis of large data sets : evaluation and application
Lam, Heidi Lap Mun
DOI : 10.14288/1.0051365
URI : http://hdl.handle.net/2429/839
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

Large data sets are difficult to analyze. Visualization has been proposed to assist exploratory data analysis (EDA) as our visual systems can process signals in parallel to quickly detect patterns. Nonetheless, designing an effective visual analytic tool remains a challenge. This challenge is partly due to our incomplete understanding of how common visualization techniques are used by human operators during analyses, either in laboratory settings or in the workplace. This thesis aims to further understand how visualizations can be used to support EDA. More specifically, we studied techniques that display multiple levels of visual information resolutions (VIRs) for analyses using a range of methods. The first study is a summary synthesis conducted to obtain a snapshot of knowledge in multiple-VIR use and to identify research questions for the thesis: (1) low-VIR use and creation; (2) spatial arrangements of VIRs. The next two studies are laboratory studies to investigate the visual memory cost of image transformations frequently used to create low-VIR displays and overview use with single-level data displayed in multiple-VIR interfaces. For a more well-rounded evaluation, we needed to study these techniques in ecologically-valid settings. We therefore selected the application domain of web session log analysis and applied our knowledge from our first three evaluations to build a tool called Session Viewer. Taking the multiple coordinated view and overview + detail approaches, Session Viewer displays multiple levels of web session log data and multiple views of session populations to facilitate data analysis from the high-level statistical to the low-level detailed session analysis approaches. Our fourth and last study for this thesis is a field evaluation conducted at Google Inc. with seven session analysts using Session Viewer to analyze their own data with their own tasks. Study observations suggested that displaying web session logs at multiple levels using the overview + detail technique helped bridge between high-level statistical and low-level detailed session analyses, and the simultaneous display of multiple session populations at all data levels using multiple views allowed quick comparisons between session populations. We also identified design and deployment considerations to meet the needs of diverse data sources and analysis styles.

Towards smooth particle filters for likelihood estimation with multivariate latent variables
Lee, Anthony
DOI : 10.14288/1.0051412
URI : http://hdl.handle.net/2429/1547
Degree : Master of Science - MSc
Graduation Date : 2008-11

Semantics-based resource discovery in global-scale grids
Li, Juan
DOI : 10.14288/1.0051187
URI : http://hdl.handle.net/2429/2845
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-11

Grid computing is a virtualized distributed computing environment aimed at enabling the sharing of geographically distributed resources. Grid resources have traditionally consisted of dedicated supercomputers, clusters, or storage units. With the present ubiquitous network connections and the growing computational and storage capabilities of modem everyday-use computers, more resources such as PCs, devices (e.g., PDAs and sensors), applications, and services are on grid networks. Grid is expected to evolve from a computing and data management facility to a pervasive, world-wide resource-sharing infrastructure. To fully utilize the wide range of grid resources, effective resource discovery mechanisms are required. However, resource discovery in a global-scale grid is challenging due to the considerable diversity, large number, dynamic behavior, and geographical distribution of the resources. The resource discovery technology required to achieve the ambitious global grid vision is still in its infancy, and existing applications have difficulties in achieving both rich searchability and good scalability. In this thesis, we investigate the resource discovery problem for open-networked global-scale grids. In particular, we propose a distributed semantics-based discovery framework. We show how this framework can be used to address the discovery problem in such grids and improve three aspects of performance: expressiveness, scalability, and efficiency. Expressiveness is the first characteristic that a grid resource-searching mechanism should have. Most existing search systems use simple keyword-based lookups, which limit the searchability of the system. Our framework improves search expressiveness from two directions: First, it uses a semantic metadata scheme to provide users with a rich and flexible representation mechanism, to enable effective descriptions of desired resource properties and query requirements. Second, we employ ontological domain knowledge to assist in the search process. The system is thus able to understand the semantics of query requests according to their meanings in a specific domain; this procedure helps the system to locate only semantically related results. The more expressive the resource description and query request, however, the more difficult it is to design a scalable and efficient search mechanism. We ensure scalability by reconfiguring the network with respect to shared ontologies. This reconfiguration partitions the large unorganized search space into multiple well-organized semantically related sub-spaces that we call semantic virtual organizations. Semantic virtual organizations help to discriminatively distribute resource information and queries to related nodes, thus reducing the search space and improving scalability. To further improve the efficiency of searching the virtual organizations, we propose two semantics-based resource-integrating and searching systems: GONID and OntoSum. These two systems address searching problems for applications based on different network topologies: structured and unstructured peer-to-peer overlay networks. Queries in the search systems are processed in a transparent way, so that users accessing the data can be insulated from the fact that the information is distributed across different sources and represented with different formats. In both systems, ontological knowledge is decomposed into different coarse-grained elements, and then these elements are indexed with different schemes to fit the requirements of different applications. Resource metadata reasoning, integrating, and searching are based on the index. A complex query can be evaluated by performing relational operations such as select, project, and join on combinations of the indexing elements. We evaluate the performance of our system with extensive simulation experiments, the results of which confirm the effectiveness of the design. In addition, we implement a prototype that incorporates our ontology-based virtual organization formation and semantics-based query mechanisms. Our deployment of the prototype verifies the system's feasibility and its applicability to real-world applications.

A content-based publish/subscribe framework over structured peer-to-peer networks
Li, Wei
DOI : 10.14288/1.0051469
URI : http://hdl.handle.net/2429/5645
Degree : Master of Science - MSc
Graduation Date : 2008-11

Improving disk read performance through block-level replication into free space
Lifchits, Andrei
DOI : 10.14288/1.0051359
URI : http://hdl.handle.net/2429/892
Degree : Master of Science - MSc
Graduation Date : 2008-05

Lacome: a cross-platform multi-user collaboration system for a shared large display
Liu, Zhangbo
DOI : 10.14288/1.0051402
URI : http://hdl.handle.net/2429/378
Degree : Master of Science - MSc
Graduation Date : 2008-05

An empirical analysis of lexical polarity and contextual valence shifters for opinion classification
Longton, Adam
DOI : 10.14288/1.0051409
URI : http://hdl.handle.net/2429/4180
Degree : Master of Science - MSc
Graduation Date : 2008-11

wypy : an extensible, online interference detection tool for wireless networks
Lotun, Reza M. E.
DOI : 10.14288/1.0051207
URI : http://hdl.handle.net/2429/515
Degree : Master of Science - MSc
Graduation Date : 2008-05

Web personalization based on association roles finding on both static and dynamic Web data
Lu, Minghao
DOI : 10.14288/1.0051369
URI : http://hdl.handle.net/2429/4162
Degree : Master of Science - MSc
Graduation Date : 2008-11

Semi-supervised and active training of conditional random fields for activity recognition
Mahdaviani, Maryam
DOI : 10.14288/1.0051289
URI : http://hdl.handle.net/2429/346
Degree : Master of Science - MSc
Graduation Date : 2008-05

JQuery - a tool for combining query results and a framework for building code perspectives
Markle, Lloyd
DOI : 10.14288/1.0051360
URI : http://hdl.handle.net/2429/1888
Degree : Master of Science - MSc
Graduation Date : 2008-11

Parallax : volume management for virtual machines
Meyer, Dutch Thomassen
DOI : 10.14288/1.0051297
URI : http://hdl.handle.net/2429/2612
Degree : Master of Science - MSc
Graduation Date : 2008-11

Support for time-sensitive applications via corporate polling
Saubhasik, Mayukh
DOI : 10.14288/1.0051441
URI : http://hdl.handle.net/2429/5406
Degree : Master of Science - MSc
Graduation Date : 2008-11

A Levenberg-Marquardt method for large-scale bound-constrained nonlinear least-squares
Shan, Shidong
DOI : 10.14288/1.0051461
URI : http://hdl.handle.net/2429/5648
Degree : Master of Science - MSc
Graduation Date : 2008-11

Achieving predictable timing and fairness through cooperative polling
Sinha, Anirban
DOI : 10.14288/1.0051182
URI : http://hdl.handle.net/2429/436
Degree : Master of Science - MSc
Graduation Date : 2008-05

MMSP : an alternative transport protocol for multiple co-existing networks
Song, Haoran
DOI : 10.14288/1.0051233
URI : http://hdl.handle.net/2429/5417
Degree : Master of Science - MSc
Graduation Date : 2008-11

On managing visibility of resources in social networking sites
Su, Ying
DOI : 10.14288/1.0051240
URI : http://hdl.handle.net/2429/5506
Degree : Master of Science - MSc
Graduation Date : 2008-11

Supervised machine learning for email thread summarization
Ulrich, Jan
DOI : 10.14288/1.0051210
URI : http://hdl.handle.net/2429/2363
Degree : Master of Science - MSc
Graduation Date : 2008-11

OmniStream : using centralized tree management for incentives-compatible peer-to-peer media streaming
Upadhyaya, Ankur
DOI : 10.14288/1.0051239
URI : http://hdl.handle.net/2429/5503
Degree : Master of Science - MSc
Graduation Date : 2008-11

Fluid surface reconstruction from particles
Williams, Brent Warren
DOI : 10.14288/1.0051410
URI : http://hdl.handle.net/2429/906
Degree : Master of Science - MSc
Graduation Date : 2008-05

An adaptive packet size approach to TCP congestion control
Wilson, Steven
DOI : 10.14288/1.0051234
URI : http://hdl.handle.net/2429/5527
Degree : Master of Science - MSc
Graduation Date : 2008-11

Multi-view hockey tracking with trajectory smoothing and camera selection
Wu, Lan
DOI : 10.14288/1.0051270
URI : http://hdl.handle.net/2429/2402
Degree : Master of Science - MSc
Graduation Date : 2008-11

Controllable, non-oscillatory damping for deformable objects
Young, Herbert David
DOI : 10.14288/1.0051414
URI : http://hdl.handle.net/2429/217
Degree : Master of Science - MSc
Graduation Date : 2008-05

Multiagent learning and empirical methods
Zawadzki, Erik P.
DOI : 10.14288/1.0051290
URI : http://hdl.handle.net/2429/2480
Degree : Master of Science - MSc
Graduation Date : 2008-11

PSS : a phonetic search system for short text documents
Zhang, Jerry Jiaer
DOI : 10.14288/1.0051236
URI : http://hdl.handle.net/2429/5537
Degree : Master of Science - MSc
Graduation Date : 2008-11

Evaluations on XML standards for actual applications
Zhang, Jiemin
DOI : 10.14288/1.0051237
URI : http://hdl.handle.net/2429/5538
Degree : Master of Science - MSc
Graduation Date : 2008-11

Spatial trend prefetching for online maps mashups
Zhang, Jun
DOI : 10.14288/1.0051241
URI : http://hdl.handle.net/2429/5540
Degree : Master of Science - MSc
Graduation Date : 2008-11

Online experimenter : an evaluation of experiments conducted under local and remote conditions
Zhang, Ying
DOI : 10.14288/1.0051242
URI : http://hdl.handle.net/2429/5542
Degree : Master of Science - MSc
Graduation Date : 2008-11

Discovering and summarizing email conversations
Zhou, Xiaodong
DOI : 10.14288/1.0051392
URI : http://hdl.handle.net/2429/337
Degree : Doctor of Philosophy - PhD
Graduation Date : 2008-05

With the ever increasing popularity of emails, it is very common nowadays that people discuss specific issues, events or tasks among a group of people by emails. Those discussions can be viewed as conversations via emails and are valuable for the user as a personal information repository. For instance, in 10 minutes before a meeting, a user may want to quickly go through a previous discussion via emails that is going to be discussed in the meeting soon. In this case, rather than reading each individual email one by one, it is preferable to read a concise summary of the previous discussion with major information summarized. In this thesis, we study the problem of discovering and summarizing email conversations. We believe that our work can greatly support users with their email folders. However, the characteristics of email conversations, e.g., lack of synchronization, conversational structure and informal writing style, make this task particularly challenging. In this thesis, we tackle this task by considering the following aspects: discovering emails in one conversation, capturing the conversation structure and summarizing the email conversation. We first study how to discover all emails belonging to one conversation. Specifically, we study the hidden email problem, which is important for email summarization and other applications but has not been studied before. We propose a framework to discover and regenerate hidden emails. The empirical evaluation shows that this framework is accurate and scalable to large folders. Second, we build a fragment quotation graph to capture email conversations. The hidden emails belonging to each conversation are also included into the corresponding graph. Based on the quotation graph, we develop a novel email conversation summarizer, ClueWordSummarizer. The comparison with a state-of-the-art email summarizer as well as with a popular multi-document summarizer shows that ClueWordSummarizer obtains a higher accuracy in most cases. Furthermore, to address the characteristics of email conversations, we study several ways to improve the ClueWordSummarizer by considering more lexical features. The experiments show that many of those improvements can significantly increase the accuracy especially the subjective words and phrases.



Find us on Twitter

a place of mind, The University of British Columbia

 

ICICS/CS Building 201-2366 Main Mall
Vancouver, B.C. V6T 1Z4 Canada
Tel: 604-822-3061 | Fax: 604-822-5485
General: help@cs.ubc.ca
Undergrad program: undergrad-info@cs.ubc.ca
Graduate program: grad-info@cs.ubc.ca

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia