CS Theses & Dissertations 2024

For 2024 graduation dates (in alphabetical order by last name):

Personalizing explanations of AI hints based on user characteristics in an intelligent tutoring system
Bahel, Vedant Rajesh
DOI : 10.14288/1.0443819
URI : http://hdl.handle.net/2429/88387
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Cristina Conati

A formal framework for understanding runtime checking errors in gradually typed languages
Bañados Schwerter, Felipe Andres
DOI : 10.14288/1.0441320
URI : http://hdl.handle.net/2429/87780
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Ron Garcia

Although Abstracting Gradual Typing provides a systematic approach to design gradual languages, the original framework has limitations: first, it accepts design choices that lead to type inconsistencies sneaking through evaluation. Second, when a type inconsistency is identified at run time, evaluation halts without providing any feedback on the parts of the program related to the failure, a safe approach yet unhelpful for debugging. This dissertation addresses these two limitations of the Abstracting Gradual Typing framework. For the first limitation, I impose an extra constraint on the acceptable designs for gradual types: forward completeness of every type operation. This stricter constraint guarantees that, throughout evaluation, gradual types and runtime evidence objects cannot lose precision and will only represent information consistent with the original static type system. I introduce a new design for a gradual language with record subtyping that fulfills this restriction. For the second limitation, I provide a specification for runtime program slicing that can be systematically applied to languages designed using Abstracting Gradual Typing. Slicing can separate the portions of a program that are guaranteed to be uninvolved in a runtime failure. Unlike the standard blame approach, slicing does not assume that types are correct. The slicing semantics can be used to provide a debugging tool, and I apply empirical research methods to explore whether this runtime type slicing approach is useful to developers.

Algorithms For Geometry Partitioning and Reshaping
Barbosa de Souza Araujo, Chrystiano
DOI : 10.14288/1.0445453
URI : http://hdl.handle.net/2429/89297
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Alla Sheffer

Virtual digital content, including 2D vector clip art and 3D meshes, has experienced a significant surge in popularity in recent years. The increasing availability of these digital assets on virtual platforms enables professionals and amateurs to find and repurpose existing content to suit their specific needs. Whether it involves customizing assets to align with creative preferences or satisfying the constraints of downstream applications, such as those imposed by digital manufacturing technologies, editing digital assets remains a highly complex and time-consuming task. This complexity poses a significant barrier to the widespread adoption of virtual platforms and digital manufacturing technologies. In this thesis, we investigate novel and easy-to-use approaches for critical operations in digital content editing: reshaping and volumetric partitioning. First, we address the challenges of 2D/3D reshaping. When editing digital content, users often desire to customize existing assets to generate new looks and styles while preserving their original structure. However, the lack of automatic tools suitable for reshaping tasks leads users to rely on labor-intensive and complex modeling tasks. We introduce novel user-centric algorithmic solutions for reshaping 2D vector clip art and 3D meshes, enabling users to effortlessly produce outputs that align with their expectations of reshaping operations. We rigorously validate our methods across various inputs and by comparing our outputs to those produced by alternative approaches and professional artists. The second part of this thesis focuses on 3D geometry partitioning. When producing physical replicas of 3D digital content, users often desire to fabricate objects with multi-attribute surface regions (e.g. distinct colors or materials). However, manufacturing these objects as single pieces can be challenging or even impossible. An alternative solution is to partition such objects into single-attribute parts that allow per-part fabrication and subsequent assembly. To overcome the complexity of performing this operation manually, we introduce a novel easy-to-use algorithm for surface-segmentation conforming and assemblable volumetric partitioning. The robustness of our method is demonstrated on a variety of complex models, and it is validated via comparisons with alternative approaches. The algorithmic solutions presented in this thesis enable end-users to effortlessly customize digital content to meet their reshaping goals or to comply with the constraints of multi-attribute 3D fabrication.

Exploring cultural competence in language and multimodal models
Bhatia, Mehar
DOI : 10.14288/1.0445242
URI : http://hdl.handle.net/2429/89107
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Vered Shwartz

Gaussian shadow casting for neural characters
Bolanos, Luis Andoni
DOI : 10.14288/1.0441531
URI : http://hdl.handle.net/2429/88000
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Helge Rhodin

Computational experiment comprehension using provenance summarization
Boufford, Nichole Chelsea
DOI : 10.14288/1.0440963
URI : http://hdl.handle.net/2429/87659
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Thomas Pasquier

Tool coordination in software development workspaces
Bradley, Nicholas
DOI : 10.14288/1.0445504
URI : http://hdl.handle.net/2429/89344
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Reid Holmes & Dr. Thomas Fritz

Building and evolving modern software systems requires developers to use multiple tools within their workspaces. Unfortunately, the design of current workspaces, where tools operate independently of each other, leaves the manual and laborious orchestrating of these independent tools to developers. The constant tool coordination, navigation, and configuration represents unnecessary work that acts as a form of friction that impedes developer productivity. In particular, the workspace's tool-centric design silos the information developers need for their tasks. Due to the diversity of projects and tasks developers work on, even advanced tools like the IDE will never be able to solve this unnecessary work for all developers. Instead, new approaches are needed to overcome challenges developer's encounter locating and aligning information between their tools. In this thesis, we examine the friction that developers experience from the tool-centric design of current workspaces, and explore two context-centric approaches, Helm and Scout, which use contextual information across tools to help mitigate the friction developers encounter when locating and aligning information necessary for their tasks. Our first approach, Helm, is a lightweight system that automatically captures contextual information about the developer's project and the resources they access to reduce the effort of manually re-finding resources. Helm uses this context to recommend resources, including source code files, web pages, and commands, from a central location enabling developers to more directly navigate between resources. Our second approach, Scout, uses contextual information about the developer's source code to help the developer locate API information on the web. Scout tailors search results to the developer's task by extracting, ranking, and presenting the most relevant API signature information directly within the IDE where the information is needed. Through two controlled studies, one with 17 developers using Helm and another with 40 developers using Scout, we analyzed how the approaches affected the way developers located information in their workspace. We found that, by using contextual information across tools, approaches can significantly help developers locate information for their tasks. Building on these findings, we establish a new vision for the modern workspace where tools share contextual information to proactively support developers' tasks and improve productivity by reducing unnecessary friction.

Computing the attention center of a simple polygon
Brown, Dylan Michael
DOI : 10.14288/1.0444830
URI : http://hdl.handle.net/2429/88711
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Will Evans

Compilation as normalization : a multi-language semantics approach to compiler correctness
Bryant, Lily Anna
DOI : 10.14288/1.0445241
URI : http://hdl.handle.net/2429/89101
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. William Bowman

The computation of meaning : from embodied emotions to cognitive schemas
Bucci, Paul
DOI : 10.14288/1.0447082
URI : http://hdl.handle.net/2429/89487
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Ivan Beschastnikh

How do we compute meaning? To make something computable, we must reduce the world to logical operations on electrical signals. However, our human experience is that the world has an uncomputable, meaningful aspect that seems to defy mere information processing. The quantitative world of computing demands measurable, objective signals to be translated into the qualitative world of affect, emotion, and meaning. Is it possible to make the two worlds of qualitative and quantitative meet? In this dissertation, I report on, analyze, and draw conclusions from two multi-part projects that attempt to answer this question from different perspectives using interactive systems and machine learning. First, we look at computing meaning by attempting to detect emotions using signals derived from the body such as heart rate, brain waves, and gestures. Then, we look at computing meaning by making connections between documents to support thematic exploration of large document corpora. My contributions in this dissertation are: A critical theoretical and methodological proposition for computationally representing, sensing, and displaying real-time emotions. A synthesis of the theoretical and pragmatic basis of therapeutic care methods and their meaning for affective robotics, with an accompanying account of the constructed nature of emotions for HRI applications. The design and evaluation of a system (called Teleoscope) for capturing underlying meaning in documents through interaction with machine learning systems. An extension to thematic analysis for data curation to create meaning in large text datasets, which we call thematic exploration, and a methodological concept of schema crystallization. Through these projects, an underlying understanding of meaning-making as an embedded, embodied, emergent, interactive phenomenon is articulated. That is to say, meaning is embedded in a culture and environment, embodied in the whole of a person, and emerges through the process of interaction between a person, themselves, other people, and their environment. By understanding these epiphenomenal interactions, designers may be enabled to create computational systems that facilitate richer meaning-making.

From devices to data and back again : a tale of computationally modelling affective touch
Cang, Xi Laura
DOI : 10.14288/1.0442005
URI : http://hdl.handle.net/2429/88065
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Karon Maclean

Emotionally responsive Human-Robot Interaction (HRI) has captured our curiosity and imagination in fantastical ways throughout much of modern media. With touch being a valuable yet sorely missed emotion communication channel when in-person interaction is unrealistic for practical reasons, we could look to machine-mediated ways to bridge that distance. In this thesis, we investigate how we might enable machines to recognize natural and spontaneous emotional touch expressions in two parts. First, we take a close look at ways machines engage with human emotion by examining examples of machines in three emotionally communicative roles: as a passive witness receiving and logging the emotional state of their (N=30) human counterparts, as an influential actor whose own breathing behaviour alters human fear response (N=103), and as a conduit for the transmission of emotion expression between human users (N=10 dyads and N=21 individuals). Next, we argue that in order for devices to be truly emotionally reactive, they should address the time-varying and dynamic nature of emotional lived experience. Any computational or emotion recognition engine intended for use under realistic conditions should acknowledge that emotions will evolve over time. Machine responses may change with changing ‘emotion direction’ – acting in an encouraging way when the user is `happy and getting happier' vs. presenting calming behaviours for `happy but getting anxious'. To that end, we develop a multi-stage emotion self-reporting procedure for collecting N=16 users’ dynamic emotion expression during videogame play. From their keypress force controlling their in-game character, we benchmark individualized recognition performance for emotion direction, even finding it to exceed that of brain activity (as measured by continuous Electroencephalography (EEG)). For a proof-of-concept of a training process that generates models of true and spontaneous emotion expression evolving with the user, we then revise our protocol to be more flexible to naturalistic emotion expression. We build a custom tool to help with data collection and labelling of personal storytelling sessions and evaluate user impressions (N=5 with up to 3 stories each for a total of 10 sessions). Finally, we conclude with actionable recommendations for advancing the training and machine recognition of naturalistic and dynamic emotion expression.

Data-driven models of human body inertia
Chen, Guanxiong
DOI : 10.14288/1.0442032
URI : http://hdl.handle.net/2429/88062
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Dinesh Pai

Separating biological processes in single-cell data with deep generative models
Chen, Sarah Wendy
DOI : 10.14288/1.0445562
URI : http://hdl.handle.net/2429/89411
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Jiarui Ding

Visual question answering with contextualized commonsense knowledge
Chinchure, Aditya Aravind
DOI : 10.14288/1.0441296
URI : http://hdl.handle.net/2429/87765
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Leonid Sigal & Dr. Renjie Liao (EECE)

Machine learning for spectroscopic data analysis : challenges of limited labelled data
Dirks, Matthew
DOI : 10.14288/1.0438638
URI : http://hdl.handle.net/2429/87209
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. David Poole

Extracting meaningful information from spectra, such as sample composition, proves to be challenging. Building prediction models with supervised learning requires labelled data which is often limited. To overcome the challenge of limited data, this thesis explores various strategies spanning the gamut from models reliant on domain knowledge to those primarily data-driven. Leveraging domain knowledge, an approximate, fully-differentiable X-ray fluorescence (XRF) simulator is developed and used in two models. In the first model, the simulator is fit to an observed spectrum. The resulting parameter values are mapped to element concentrations by regression modelling. In the second model, the simulator is embedded in an auto-encoder (AE) neural network. The AE learns the inverse function of the simulator while also adapting to the data when instrument or environment parameters are unavailable. An experiment comparing the AE to standard regression models found improved predictions for 11 elements. Another AE model is developed that uses more general domain knowledge about spectra, which applies to any type of spectrum containing peak-shaped structures. With this model, a statistically significant decrease in prediction error compared to state-of-the-art models was found for predicting tin concentration (with p < 0.00001) in the results of 10×10-fold cross-validation, and it was tied for best on 11 out of 32 elements. A benefit of both AE models is that they can utilize unlabelled data in semi-supervised learning to lower the requirements for ground truth data. Neural networks require extensive hyperparameter optimization (HPO) which depends on validation data to estimate performance accurately. HPO works poorly when the validation set score is noisy; noisy validation scores are typical of small datasets. Ensembling is used to lower the variance, resulting in a neural network configuration that performs as well as an expertly-chosen configuration. A final prediction model combines information from multiple spectrometers, which is particularly challenging for small datasets. Several sensor fusion methods are compared, including a parallel-input convolutional neural network (CNN). Results of 10-fold cross-validation found that high-level PLS-based methods were best, though neural network models were competitive.

Formal specification and verification techniques for mutable references and advanced aliasing in Rust
Ewert, David George
DOI : 10.14288/1.0438326
URI : http://hdl.handle.net/2429/86986
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Alexander Summers

Practical ad hoc tangible interactions in augmented reality
Fan, Xu
DOI : 10.14288/1.0438330
URI : http://hdl.handle.net/2429/86989
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Robert Xiao

Towards alleviating human supervision for document-level relation extraction
Feng, Yuxi
DOI : 10.14288/1.0441405
URI : http://hdl.handle.net/2429/87864
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Laks Lakshmanan

Motivated by various downstream applications, there is tremendous interest in the automatic construction of knowledge graphs (KG) by extracting relations from text corpora. Relation Extraction (RE) from unstructured data sources is a key component for building large-scale KG. In this thesis, I focus on the research centered on Document Level Relation Extraction. One challenge of Document Level Relation Extraction is the lack of labeled training data since the construction of a large in-domain labeled dataset would require a large amount of human labor. To alleviate human supervision on documentlevel relation extraction, I propose 1) an unsupervised RE method CIFRE which enhances the recall of pipeline-based approaches while keeping high precision; 2) a semi-supervised RE method DuRE when few labeled data are available, by leveraging self-training to generate pseudo text. In order to improve the quality of pseudo text, I also propose two methods (DuNST and KEST) to improve the controllability and diversity of semi-supervised text generation, solving the challenges of inadequate unlabeled data, overexploitation, and training deceleration. Comprehensive experiments on real datasets demonstrate that our proposed methods significantly outperform all baselines, proving the effectiveness of our methods in unsupervised and semi-supervised document-level relation extraction.

[no title]
Garg, Anubhav
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Danica Sutherland

AvatARoid : using a motion-mapped AR overlay to bridge the embodiment gap between robot and teleoperator in robot-mediated telepresence
Ghimire, Amit
DOI : 10.14288/1.0438038
URI : http://hdl.handle.net/2429/86769
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Dongwook Yoon

Streaming algorithms with differential privacy guarantee
Gong, Emily (Zehui)
DOI : 10.14288/1.0445304
URI : http://hdl.handle.net/2429/89161
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Nick Harvey

Learning temporal action chunking for motor control
Gou, Ruiyu
DOI : 10.14288/1.0445184
URI : http://hdl.handle.net/2429/89057
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Michiel van de Panne

Semantically consistent video inpainting with conditional diffusion models
Green, Dylan Scott
DOI : 10.14288/1.0445038
URI : http://hdl.handle.net/2429/88917
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Mark Schmidt & Dr. Frank Wood

ProtoCloud : a generative prototypical self-explanatory model for cell type annotation
Guo, Kaiyun
DOI : 10.14288/1.0445300
URI : http://hdl.handle.net/2429/89164
Degree : Master of Science – MSc
Graduation Date : 2024-11
Supervisor : Dr. Jiarui Ding

Flexible Conditioning in Generative Models of Images and Video
Harvey, William
DOI : 10.14288/1.0445139
10.14288/1.0441320
URI : http://hdl.handle.net/2429/89004
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Frank Wood

Recent advances in the field of deep generative modelling are leading to increasingly faithful models of real-world data including images and videos. Of particular practical interest are conditional generative models, which parameterise conditional probability distributions given data features. Flexibly-conditional generative models are more flexible than conventional conditional models in the sense that they allow any data features to be conditioned on. This makes them applicable to tasks like image inpainting where we want the same model that can, e.g., inpaint the top half of an image, to also be capable of, e.g., inpainting the bottom half. Flexible conditioning has previously been demonstrated for data types including fixed-size images and short videos, but our thesis is that it can be enabled in a much broader variety of settings. The first setting we will consider is long-video generation, which is normally problematic because the data is high-dimensional and compute constraints can prevent our model from conditioning on all possible frames. The second is where the data dimensionality (e.g. number of frames in a video) is stochastic and can depend on what we condition on. We present techniques to enable flexible conditioning in both of these settings. We further show that the resulting models can sometimes improve on baselines in terms of sample quality even for conventional generation tasks. Another barrier to flexibly-conditional modelling has been the computational cost of training any high-quality generative models on moderate- or high-resolution visual data. We therefore end by presenting the first technique to mitigate this cost for the training of flexibly-conditional variational auto-encoders by incorporating pretrained unconditional model weights.

Structured representation learning by controlling generative models
He, Xingzhe
DOI : 10.14288/1.0440629
URI : http://hdl.handle.net/2429/87531
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Helge Rhodin

Object correspondence and structure play critical roles in image generation, 3D reconstruction and animation. In recent years, supervised algorithms have dramatically improved the accuracy of the learned correspondence. However, these approaches are expensive due to manual annotation and do not generalize well to new domains. We propose several methods in this dissertation on unsupervised structure learning from casually recorded images and videos. To be specific, we propose a Generative Adversarial Network (GAN)-based unsupervised keypoint detector and extend it for object part segmentation. Furthermore, we introduce a representation for unsupervised keypoints relationship estimation. We later adapted this technique for few-shot keypoint learning, depth prediction, and occlusion handling. In addition, we propose a dataset generation approach for diffusion model personalization to implicitly learn the object structure and appearance. The overarching goal of this dissertation is to make progress in the field of unsupervised object correspondence and structure learning. Our proposed methods outperform existing unsupervised methods on the established keypoint estimation and part segmentation benchmarks and paves the way for structure-conditioned generative models on more diverse datasets.

Reconciling the model-implementation duality in PGo
Hosseini, Seyed Shayan
DOI : 10.14288/1.0437579
URI : http://hdl.handle.net/2429/86484
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Ivan Beschastnikh

Empowering student query debugging : feedback for aggregate queries via provenance summarization
Huang, Jingxuan
DOI : 10.14288/1.0444013
URI : http://hdl.handle.net/2429/88504
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Rachel Pottinger

Methods for design of efficient on-device natural language processing architectures
Jawahar, Ganesh
DOI : 10.14288/1.0441384
10.14288/1.0441320
URI : http://hdl.handle.net/2429/87848
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Muhammad Abdul-Mageed & Dr. Laks Lakshmanan

Deep learning based models often achieve state-of-the-art performance in a wide range of natural language processing (NLP) tasks, which include open-ended tasks (e.g., story generation, brainstorming, and chat) and closed-ended tasks (e.g., summarization, question answering, and rewriting). To further enhance quality, there is a growing interest in scaling the model size and the amount of data used for training. These research efforts often overlook the impact of footprint metrics, such as high latency, high memory usage, and high energy consumption, on these deep learning models. A high footprint makes these models significantly inefficient for deployment on servers and devices such as tablets, handhelds, and wearables. Methods for improving model efficiency often come at the cost of degrading model quality. In this dissertation, we address the central question: how can we push the envelope in improving the efficiency-quality tradeoff of deep learning models for on-device NLP tasks? To this end, we propose methods that take on-device efficiency constraints (e.g., ≤ 16 MB memory or ≤ 200 ms latency) to inform the design of the model architecture. We propose methods for the manual design of architecture for the auto-completion task (generate continuations for user-written prompts) that enjoy a better memory-accuracy tradeoff than existing auto-completion models (Chapter 2). Additionally, we introduce methods that can directly take efficiency constraints to automatically search for efficient sparsely activated architectures for machine translation tasks (Chapter 3) and efficient pretrained (task-agnostic) language modeling architectures (Chapter 4). Finally, in Chapter 5, we explore a novel use case of employing large language models to speed up architecture search, while maintaining the efficiency and quality of state-of-the-art neural architecture search algorithms.

Exploring the potential of LLMs for biomedical relation extraction
Kanwal, Swati
DOI : 10.14288/1.0440989
URI : http://hdl.handle.net/2429/87686
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Laks Lakshmanan

PANORAMIA : privacy auditing of machine learning models without retraining
Kazmi, Mishaal
DOI : 10.14288/1.0445452
URI : http://hdl.handle.net/2429/89301
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Mathias Lécuyer & Dr. Ivan Beschastnikh

Improving language models with novel contrastive learning objectives
Khondaker, MD Tawkat Islam
DOI : 10.14288/1.0438339
URI : http://hdl.handle.net/2429/87003
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Muhammad Abdul-Mageed & Dr. Laks Lakshmanan

Partwise model predictive control for interactive contact-guided motion synthesis
Khoshsiyar, Niloofar
DOI : 10.14288/1.0440693
URI : http://hdl.handle.net/2429/87587
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Michiel van de Panne

Why do machine learning optimizers that work, work?
Kunstner, Frederik Dieter
DOI : 10.14288/1.0445444
URI : http://hdl.handle.net/2429/89294
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Mark Schmidt

The impressive recent applications of machine learning have coincided with an increase in the costs of developing new methods. Beyond the obvious computational cost due to the large dataset, the more insidious cost is complexity. The development of machine learning systems is not yet predictable and instead relies on rules of thumb and expensive trial and error. This thesis focuses on the fundamental methods used to build machine learning models, which “learn” by solving a numerical optimization problem to fit a model to data. Many successful optimization heuristics have emerged in recent years, but we have little understanding of why those heuristics outperform classical optimization schemes. The goal of this thesis is to build a better understanding of why methods that are widely used in machine learning work, presenting theoretical or empirical contributions in 3 areas. Algorithms are often designed to tackle specific problems with a known structure, like the expectation maximization (EM) algorithm for statistical models with missing data. Our current optimization theory ignores this structure and relies on assumptions that are not satisfied, even by textbook applications. We derive the convergence rate of EM for the most common case of exponential families, without additional assumptions. Many heuristics have been proposed to build “adaptive” methods that automatically adjust per-coordinate step-sizes, but these heuristics are often brittle as there is no formal definition of “adaptivity”. We formalize the problem as finding per-coordinate step-sizes that are competitive with the optimal ones for a given problem, and develop an algorithm that provably finds competitive step-sizes. For recently developed language models, the Adam algorithm outperforms gradient descent by such a large margin that it is now the default option. But the reason for this improvement is not clear. We empirically evaluate hypotheses proposed to explain this performance gap, showing that the gap is not due to noise, and isolate a feature of language problems that leads to optimization difficulties. We show that heavy-tailed class imbalance, where many rare classes have a large impact on the objective function, leads to a performance gap and to a correlated gradients and Hessians, hypothesized to benefit Adam.

[no title]
Li, Haley
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Mathias Lécuyer & Dr. Thomas Pasquier

Building a practical provenance-based intrusion detection and reporting system
Liang, Jinyuan
DOI : 10.14288/1.0441425
URI : http://hdl.handle.net/2429/87873
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Thomas Pasquier

On the efficiency and privacy of foundation models
Liu, Michael Finn
DOI : 10.14288/1.0445291
URI : http://hdl.handle.net/2429/89134
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Nick Harvey

Computational tools for complex electronic auctions
Newman, Neil
DOI : 10.14288/1.0440947
URI : http://hdl.handle.net/2429/87648
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Kevin Leyton-Brown

This thesis has two main concerns. The first is infrastructure that allows complex, electronic markets to function, ranging from web applications to highly specialized clearing algorithms. The second is developing computational methods to assess alternative market designs. I describe efforts developing and deploying computational infrastructure in support of markets in two very different domains: subsistence agriculture and radio spectrum allocation. I detail practical experiences (a) running a feature-phone based marketplace for agricultural trade built to match farmers with traders in developing countries, and (b) designing a solver to overcome the computational challenge of station repacking in the recent US "incentive" spectrum auction. I then present a series of three computational methods for evaluating alternative market designs, beginning with a setting where plausible models of bidding behavior are known, then relaxing this assumption and studying single-action and later sequential games.

Utilizing short-read, long-read and single-cell RNA sequencing for isoform discovery and detection
Orabi, Baraa
DOI : 10.14288/1.0445193
URI : http://hdl.handle.net/2429/89071
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Faraz Hach & Dr. Raymond Ng

Alternative splicing is an essential cellular mechanism in humans that enables increased protein diversity and tissue differentiation. Thus, the study of alternative splicing is of great importance in our endeavour to address a variety of human diseases such as cancer. Different transcriptomic sequencing technologies have been deployed to investigate alternative splicing, each with its own trade-offs: short-read sequencing, one of the most commonly used sequencing techniques, has the advantage of a low per-base error rate but suffers from short read lengths that limit its capacity to resolve most alternative splicing events accurately; long-read sequencing, a more recent technology, is able to sequence the full length of most alternative splicing transcripts but suffers from a high sequencing error rate that introduces non-trivial processing challenges; single-cell sequencing, which traditionally relies on short-read sequencing, enables cell-level resolution of transcriptomic sequencing and gene expression analysis but is severely limited in its capacity to resolve alternative splicing events; and finally, hybrid single-cell sequencing that utilizes both short- and long-read sequencing has the potential to enable cell-level gene expression analysis and alternative splicing detection but requires significant computational effort to synthesize the data of its two underlying sequencing techniques. In my dissertation, I present my work on three computational methods that enable the detection of alternative splicing using transcriptomic sequencing technologies: (i) Freddie detects alternatively spliced isoforms using long-read sequencing; (ii) scTagger maps long-reads to their error-corrected cellular barcodes in hybrid short- and long-read single-cell transcriptomic sequencing experiments; and (iii) scFreddie detects alternatively spliced isoforms in the context of the mentioned hybrid single-cell transcriptomic sequencing technique utilizing the output of tools such as scTagger.

Weakly-supervised geometry-aware novel view synthesis
Perryman, Olivia Margot
DOI : 10.14288/1.0445271
URI : http://hdl.handle.net/2429/89126
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Kwang Moo Yi & Dr. Helge Rhodin

On effective learning for multimodal data
Rahman, Tanzila
DOI : 10.14288/1.0442340
URI : http://hdl.handle.net/2429/88225
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Leonid Sigal

Humans can perceive the world through multiple modalities. Strong behavioral scientific evidence suggests that such ability, which includes implicit information integration and cross-modal alignment inherent in it, is critical for human learning. Nevertheless, until relatively recently, most deep learning methods have primarily focused on addressing single-modality issues associated with learning from vision, sound, or text. Over the recent years, however, researchers started to focus on multi-modal learning, specifically emphasizing high-level visual comprehension challenges like image-text matching, video captioning, and generation of audiovisual content. In this thesis, we aim to broaden the scope of learning from multimodal information, enhance its integration, and solve problems related to humancentric spatio-temporal perception in a manner that does not necessarily require complete supervision (e.g., granular spatio-temporal multi-modal alignment). Specifically, we focus on addressing two fundamental challenges: (1) Multimodal learning; and (2) Weak-supervision. We address these challenges across a range of diverse tasks. First, we focus on weakly-supervised dense video captioning, where we combine audio with visual features to improve state-of-the-art performance. We also show that audio itself can carry a surprising amount of information, compared to existing visual-only models. Secondly, we introduce an endto- end audio-visual co-segmentation network to recognize individual objects and corresponding sounds using only object labels, without requiring any additional supervision or bounding box proposals. Third, we propose TriBERT, a transformerbased architecture with co-attention, that learns contextual features across three modalities: vision, pose, and audio. We show that these features are general and improve performance on a variety of tasks spanning audio-visual sound source separation and cross-modal retrieval. Fourth, we delve into generative text-to-image (TTI) models, specifically to address consistency when generating complex story visualizations by augmenting diffusion models with memory module. Finally, we look at aspects of penalization within TTI. This allows us to generate diverse visuals for custom and user-specified concepts (e.g., a specific person, dog, etc.). Throughout our comprehensive analysis of these tasks within this thesis, we present significant algorithmic, theoretical, and empirical contributions to the field of multimodal machine learning and computer vision.

Pay to (not) play : monetizing impatience in mobile games
Raman, Narun
DOI : 10.14288/1.0438636
URI : http://hdl.handle.net/2429/87213
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Kevin Leyton-Brown

CUTTANA : scalable graph partitioning for faster distributed graph databases and analytics
Rezaei Hajidehi, Milad
DOI : 10.14288/1.0441386
URI : http://hdl.handle.net/2429/87855
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Margo Seltzer

Enriching block-based end-user programming with visual features
Ritschel, Nico
DOI : 10.14288/1.0437515
URI : http://hdl.handle.net/2429/86428
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Reid Holmes & Dr. Ron Garcia

Today, most programmers are not professional software developers, but end-users with limited training and experience in programming. End-user-friendly programming languages and tools aim to support this type of user, and many use visual programming aids to do so. Block-based programming is a popular visual programming style that has been effectively used in computer science education and is the foundation for many modern end-user programming tools. Because of the popularity of block-based programming, language designers can use a rich set of existing technologies that save them the effort of creating visual programming designs from scratch. However, many language designers ignore that block-based programming was created with learners in mind, who have different needs than end-users. Especially when programs grow in size and complexity, blocks offer little support to help end-users understand and edit programs effectively. In our work, we augment block-based programming with visual features that extend the range of programs that end-users can comprehend and write. In particular, we create languages and environments for the domain of robotics programming that allow end-users to write larger and more expressive programs. We focus on three scenarios that represent challenges that end-users face in this domain: coordinating multiple robots that work in tandem, writing large programs that span several workstations in different locations, and reacting to external signals such as machines or user interactions. For each environment, we first discuss the limitations of existing work in the areas of block-based and end-user programming. We present and discuss the design of our visual extensions with the goal to maintain end-user-friendliness. Finally, we evaluate our work through empirical studies, both formative to inform our designs and summative to demonstrate their benefits. Our designs, and the empirical and analytical process that we applied to create them, both contribute to a stronger understanding of how to build end-user-centric tools. We further believe that although our work focuses on the domain of robotics, these contributions transfer to other areas of end-user programming as well.

Temporal hypergraph representation learning : from predicting future interactions in networks to anomaly detection in the human brain
Sadeghian, Sadaf
DOI : 10.14288/1.0441321
URI : http://hdl.handle.net/2429/87781
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Margo Seltzer

Privacy, experts, and martingales : an investigation on the use of analytical tools
Sanches Portella, Victor
DOI : 10.14288/1.0445228
URI : http://hdl.handle.net/2429/89084
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Nick Harvey

In this thesis, we describe new results on three problems in learning theory and probability theory: private estimation of Gaussian covariance matrices, prediction with experts’ advice, and the expected norm of martingales. Interestingly, in all of them one of the key ingredients is the use of analytical tools in mathematics. Estimation of the covariance matrix of a Gaussian distribution from samples is a classical problem in statistics that has been thoroughly studied in the literature. Recently researchers proposed the model of differential privacy, a mathematical framework to provide formal guarantees on the amount of sensitive information leaked by algorithms. This compelled researchers to revisit classical problems such as covariance matrix estimation to better understand the limits of statistical estimation under differential privacy. In this thesis we provide tight lower bounds on the accuracy of estimation of Gaussian covariance matrices under the broadest regime of parameters compared to previously known lower bounds. The framework of prediction with experts’ advice is a theoretical model where a player and an adversary play a game with multiple rounds. At each round, the player selects one of multiple experts whose advice to follow while the adversary decides on the cost of following the advice of each expert. In this thesis, we study a continuous-time model of the experts’ problem and provide several results in this setting—that often translate to the discrete problem—with a focus on anytime strategies, that is, those that do not require knowledge on the length of the game. We describe new anytime algorithms with best-known guarantees against the top-quantile of experts in hindsight. Moreover, we show an anytime strategy in continuous time whose guarantees against independent experts match the guarantees of optimal algorithms in the fixed-time setting. Finally, motivated by our investigations of the continuous-time experts’ problem, we study the problem of bounding the infinity norm of high-dimensional martingales under a large class of stopping times. We show asymptotically tight upper and lower bounds to the expected norm of range of continuous and discrete time martingales, generalizing results known for one dimensional martingales.

Exploring the influence of prototyping fidelity on feedback for an app with multi-sided interaction model
Sanobar, Yaman Fawaz
DOI : 10.14288/1.0437959
URI : http://hdl.handle.net/2429/86690
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Ian Mitchell

Adaptive randomized smoothing : certifying multi-step defences against adversarial examples
Shaikh, Mohammed Shadab Salauddin
DOI : 10.14288/1.0445441
URI : http://hdl.handle.net/2429/89295
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Mathias Lécuyer

Investigating ML potentials and deep generative models for efficient conformational sampling
Shenoy, Nikhil
DOI : 10.14288/1.0444096
URI : http://hdl.handle.net/2429/88586
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Jiarui Ding & Dr. Dominique Beaini

From videos to animatable 3d neural characters
Su, Shih-Yang
DOI : 10.14288/1.0440644
URI : http://hdl.handle.net/2429/87534
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Helge Rhodin

Realistic 3D human models have extensive applications across various domains, including entertainment, healthcare, sports, fashion, and more. The challenges in recreating life-like, high-fidelity virtual humans lie in capturing the subtle, nuanced expressions and intricate, complex body dynamics. Consequently, the human digitalization process often requires sophisticated, tailor-made multi-camera capture studios and high-precision motion-tracking systems, limiting accessibility to only a selected few. While recent developments in deep learning have made modeling virtual characters from videos possible, existing approaches still rely on template meshes and 3D surface priors constructed from accurate 3D scans, labels, and multi-view captures. In this dissertation, we take steps in template-free 3D digitalization, enabling 3D animatable human modeling directly from video footage without 3D annotations and surface priors. Our important contributions include: 1) an analysis-by-synthesis framework for jointly learning 3D body shape, appearance, and pose directly from monocular videos; 2) a disentangled body feature representation without pre-defined 3D surfaces for sample-efficient learning and unseen animation generalization; 3) a memory-efficient factorized volume representation for capturing local appearance and geometry structures; 4) a hybrid human body model combining point-based and neural-fields representations for creating 3D avatars with detailed and consistent appearances. In conclusion, we develop approaches that build upon each other to advance the technologies for accessible human digitalization.

Understanding semantics and geometry of scenes
Suhail, Mohammed
DOI : 10.14288/1.0441006
URI : http://hdl.handle.net/2429/87712
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Leonid Sigal

In this dissertation, we present new approaches for structured scene understanding from images and videos. Structured scene understanding finds numerous applications, including in robotics and autonomous vehicles, as well as in 3D content creation and video editing. The focus of this research is on three specific tasks: scene graph generation, novel view synthesis, and layered scene representation. Scene graph generation involves creating a graph structure that represents the objects and their relationships in a scene. Generating a scene graph from an image demands a comprehensive comprehension of constituent objects and their associations. Our exploration delves into integrating the often overlooked structure of the output space into the reasoning framework. Additionally, we extend beyond bounding box granularity by leveraging pixel-level masks to ground objects when such annotations are absent in scene graph datasets. Novel view synthesis involves generating new views of a scene from input images. Achieving this demands a deep comprehension of the scene's underlying geometry to ensure the rendering of pixels aligns seamlessly with the scene's structure. Within this dissertation, our exploration centers on methods capable of accurately rendering scenes, particularly when dealing with non-Lambertian surfaces. Moreover, we address the challenge of developing view-synthesis techniques capable of generating new scene perspectives without necessitating training for each scene. Layered scene representation involves decomposing a scene into different semantically meaningful layers. In our pursuit of this task, we confront the constraints inherent in existing methods when handling videos with parallax effects resulting from homography-based modeling. To address this, our exploration focuses on a methodology aimed at learning a three-dimensional (3D) layered representation. This approach aims to surpass these limitations and facilitate a more comprehensive scene decomposition. The main contributions of this thesis thus include the exploration and advancement of these tasks.

Exploring explicit models for geometric point cloud learning
Sun, Weiwei
DOI : 10.14288/1.0445525
URI : http://hdl.handle.net/2429/89368
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Kwang Moo Yi & Dr. Andrea Tagliasacchi

We are interested in processing point clouds -- a set of unordered points -- specifically in Euclidean space, such as 3D point cloud acquired from a range sensor (LiDAR) or 4D correspondence cloud in stereo matching task. Point clouds play an increasingly essential role in many tasks due to prevalence they hold. However, it is notoriously challenging to process point clouds with deep neural networks because of their irregular data structure, the difficulty in encoding contextual information from nearby points, and the large compute requirement that is typically required. This thesis addresses these challenges by enforcing intermediate features or model parameters to carry specific meanings such as attention and poses, leading to explicit representation. The meanings of explicit representation allow for traditional ways of manipulating features in order to solve target tasks. We refer to these architectures with explicit representations as explicit models. Explicit models largely improve performances without massively scaling up training data or model size because the explicit representation directly injects the prior knowledge needed by target tasks into neural networks without any learning. We explore explicit models for point cloud learning to perform robust estimation, stereo matching, segmentation, reconstruction and neural rendering. The thesis is organized into four chapters: 1, ACNe: An optimization-inspired network architecture that allows learning with point clouds contaminated with an abundance of outliers. 2, Canonical Capsules: An equivariant latent representation that consists of pose and pose-invariant features, enabling point cloud auto-encoding in unaligned datasets. 3, NeuralBF: A novel 3D instance proposal generation inspired by traditional bilateral filtering for top-down instance segmentation for 3D point clouds. 4, PointNeRF++: A multi-scale, point-based NeRF architecture, allowing seamless integration of point-based representation with Neural Radiance Fields. Across these four chapters, we show that explicit models largely improve point cloud learning, inspiring more future research in this domain. We conclude with a discussion about future works, practical tips on how to form an explicit model, and its role in the era of large foundation models.

CasCalib : cascaded calibration for motion capture from sparse unsynchronized cameras
Tang, James
DOI : 10.14288/1.0437869
URI : http://hdl.handle.net/2429/86650
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Helge Rhodin & Dr. Bastian Wandt

Deblurring neural radiance fields by modeling camera imperfections and using RGB-event stereo
Tang, Wei Zhi
DOI : 10.14288/1.0445355
URI : http://hdl.handle.net/2429/89208
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Kwang Moo Yi

Exploring equivalence and differences in software methods
Teixeira, Tarcisio Soares
DOI : 10.14288/1.0436917
URI : http://hdl.handle.net/2429/86051
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Reid Holmes

Investigation of different social agents in reinforcement learning for autonomous driving training in simulation
Tian, Yuan
DOI : 10.14288/1.0438656
URI : http://hdl.handle.net/2429/87226
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Ian Mitchell

Automatic verification of heap-dependent folds in Viper
Tokaeo, Toto (Peeranat)
DOI : 10.14288/1.0438623
URI : http://hdl.handle.net/2429/87193
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Alexander Summers

Discrete optimization problems in geometric mesh processing
Vining, Nicholas William Edward
DOI : 10.14288/1.0437353
URI : http://hdl.handle.net/2429/86347
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Alla Sheffer

The full abstract for this item is available in the body of the item, and will be available when the embargo expires (on 2025-10-31).

QuAC : quick attribute-centric type inference for Python
Wu, Jifeng
DOI : 10.14288/1.0445179
URI : http://hdl.handle.net/2429/89056
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Caroline Lemieux

Generative spectra modelling for galaxy redshift estimation
Xie, Zhuoting
DOI : 10.14288/1.0445029
URI : http://hdl.handle.net/2429/88909
Degree : Master of Science - MSc
Graduation Date : 2024-11
Supervisor : Dr. Kwang Moo Yi & Dr. Sebastien Fabbro

Versatile neural approaches to more accurate and robust topic segmentation
Xing, Linzi
DOI : 10.14288/1.0440128
URI : http://hdl.handle.net/2429/87475
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-05
Supervisor : Dr. Giuseppe Carenini

Topic segmentation, as a fundamental NLP task, has been proposed and systematically studied since the 1980s and received increased attention in recent years due to the surge in big data. It aims to unveil the coarse-grained semantic structure of long unstructured documents by automatically dividing them into shorter, topically coherent segments.The coarse-grained structure provided by topic segmentation has been proven to not only enhance human reading efficiency but also play a vital role in other natural language understanding tasks, such as text summarization, question answering, and dialogue modeling. Before the neural era, early computational models for topic segmentation typically adhered to unsupervised paradigms with lexical cohesion directly derived from the input, yet their performance was notably limited. With the evolution of deep learning and enhanced computational capabilities, neural models have delivered significant progress in performance. Nevertheless, inadequate coherence modeling, in terms of both explicitness and reliability in these neural approaches, prevents them from emerging as more accurate and robust solutions for topic segmentation. Additionally, the growing prevalence of multi-modal data content across social media platforms has heightened the need for topic segmentation to traverse beyond mere text, extending into videos. Motivated by the challenges and needs mentioned above, in this thesis, we direct our efforts towards enhancing neural topic segmentation for two types of documents: text and video. To overcome the inadequate coherence modeling (explicitness and reliability) in neural topic segmenters for text, we propose a series of methods that either more explicitly model coherence patterns or leverage coherence signals encoded in related auxiliary tasks, notably discourse parsing and language modeling. For video content, we explore to extend neural topic segmenters, originally designed for text, into a multi-modal setting which is also robust to the often-encountered drastic variance in video length. A comprehensive set of experimental results indicates that our methods not only effectively enhance the overall performance of neural segmenters for text and video in intra-domain scenarios, but also broaden their applicability to data in other domains.

Differentially private neural tangent kernels for privacy-preserving data generation and distillation
Yang, Yilin
DOI : 10.14288/1.0441283
URI : http://hdl.handle.net/2429/87742
Degree : Master of Science - MSc
Graduation Date : 2024-05
Supervisor : Dr. Mijung Park & Dr. Xiaoxiao Li

AI-powered methods for academic assessment : overcoming scalability challenges in large university classrooms and conference review
Zarkoob, Hedayat
DOI : 10.14288/1.0445198
URI : http://hdl.handle.net/2429/89059
Degree : Doctor of Philosophy - PhD
Graduation Date : 2024-11
Supervisor : Dr. Kevin Leyton-Brown

In this thesis, we use various AI techniques to address several scalability challenges in two academic environments: large university classrooms and large peer-review conferences. In large university classrooms, two main challenges that instructors face are grading open-ended assignments and facilitating in-class discussions. To tackle the issue of grading open-ended assignments at scale, we use ideas from mechanism design and graphical models to design practical peer grading systems that provide strong incentives for students to be truthful and that accurately aggregate reported grades. To facilitate in-class discussions, we develop and analyze a new web-based participation tool designed to encourage active participation from students of different demographics. For large peer-reviewed conferences, we propose a novel reviewer-paper matching approach that uses machine learning and mixed-integer programming techniques to preserve the quality of reviews by finding better matches between reviewers and papers and using reviewer resources more efficiently. To demonstrate the effectiveness of the innovations introduced, we evaluate each innovation through analysis on both real and synthetic data, as well as through survey data.