Events

Name: Chenwei Zhang

Date: December 15, 2025

Time: 9:30 AM

Location: ICCS X836

Co-Supervisors: Anne Condon and Khanh Dao Duc

Title of Thesis: Applications of Deep Generative Models in DNA Reaction Kinetics and Cryogenic Electron Microscopy

Abstract:
This dissertation explores how deep generative models can advance the analysis of challenging biological problems by integrating domain knowledge with cutting-edge deep learning techniques. It focuses on two fundamental areas: DNA reaction kinetics and cryogenic electron microscopy (cryo-EM).

In the first part, we present ViDa, a biophysics-informed deep learning framework that leverages variational autoencoders (VAEs) and geometric scattering transform to generate biophysically-plausible embeddings of DNA reaction kinetics simulations. These embeddings are further reduced to a two-dimensional Euclidean space to visualize DNA hybridization and toehold-mediated three-way strand displacement reactions. By embedding simulated secondary structures and reaction trajectories into a low-dimensional representation, ViDa preserves structure and clusters trajectory ensembles into reaction pathways, making simulation results more interpretable and revealing new insights into reaction mechanisms.

In the second part, we address key challenges in cryo-EM density map interpretation and protein structure modeling. We first provide a comprehensive review and benchmarking of state-of-the-art deep learning methods for protein structure modeling (i.e., atomic model building). We propose improved evaluation metrics to assess the performance of these methods and provide guidance for researchers. We then present Struc2mapGAN, a generative adversarial network (GAN) that synthesizes high-fidelity experimental-like cryo-EM density maps from protein structures. We finally present CryoSAMU, a structure-aware multimodal U-Net that enhances intermediate-resolution cryo-EM density maps by integrating density features with structural embeddings from protein large language models through cross-attention mechanisms.

Overall, these contributions demonstrate the potential of deep generative models to interpret DNA reaction mechanisms and to advance cryo-EM density map analysis and protein structure modeling.

-

Name: Chen Fan

Date: Friday, November 28, 2025

Time: 12:30 PM

Location: ICCS 146

Supervisors: Mark Schmidt, Christos Thrampoulidis

Thesis Title: Adaptive Step Sizes and Implicit Regularization in Optimization Models

Abstract

Given the ever-increasing size of machine learning models, better optimization algorithms are needed to improve computation efficiency. Despite significant recent progress having been made, there is a lack of understanding of the general working principles behind common optimizers for complex tasks such as neural network training. There are two criteria to be considered when measuring the performance of an optimizer: 1. the convergence speed or the rate of training-loss decrease; 2. test accuracy of the converged solution. Given these criteria, this thesis focuses on three relevant ingredients: data sampling for (stochastic) gradient computation, step sizes, and optimization implicit bias. 

We first consider a commonly used sampling-without-replacement scheme for computing the stochastic gradient, known as random reshuffling (RR). Despite its success in training deep neural networks, its theoretical justifications have only been studied recently. In the over-parameterized setting, it has not been shown that RR achieves the linear convergence rate as stochastic gradient descent (SGD) does. We bridge this gap by showing the rate of RR is indeed linear and can be faster than that of SGD. 

Stochastic Polyak (SPS) and line-search (SLS) step sizes are known to converge fast under over-parameterization, thanks to their adaptivity to the local curvature of the loss. However, without over-parameterization, there is no guarantee that they would converge to the exact solution. Given this, we first extend SPS and SLS to the non over-parameterized setting. The advantage of our modifications is that the step sizes are not required to be monotonically decreasing. We then propose variants of SPS and SLS for bilevel optimization which involves tuning two step sizes.    

Previous works that studied optimization implicit bias have focused more on the binary classification setting. However, machine learning applications are typically multiclass. To this end, we study a family of optimization algorithms known as normalized steepest descent in linear multiclass classification with separable data. This includes several popular algorithms. We show that their iterates converge to the max-margin defined with respect to the norms that are used to define the algorithms.   

 

-

Name: Shakiba Kheradmand

Date: Wednesday, November 26, 2025

Time: 10:00 AM

Location: ICCS X836

Supervisors: Kwang Moo Yi, Andrea Tagliasacchi

Title: Monte Carlo Neural Rendering

Abstract:

Recent advances in neural rendering have achieved high-quality photorealistic scene reconstruction, yet some computational challenges remain. Neural Radiance Fields are slow to train, while 3D Gaussian Splatting depends on heuristic rules, is sensitive to initialization, and fixes rendering quality regardless of computational constraints.

This thesis addresses these limitations through sampling-based methods and probabilistic reformulations.

First, we present soft mining, an importance sampling approach that accelerates neural field training by focusing computation on regions with higher reconstruction error. Using Langevin Monte Carlo, sampling probabilities adapt dynamically during training, improving both convergence speed and final rendering quality.

Second, we reformulate 3DGS as a Markov Chain Monte Carlo process, interpreting Gaussians as probabilistic samples rather than relying on manual splitting and pruning. By introducing stochastic updates via Stochastic Gradient Langevin Dynamics, we remove the dependence on heuristic density control and good initialization, resulting in a more robust optimization process.

Finally, we introduce an order-independent stochastic transparency method for Gaussian-based rendering, eliminating the costly sorting step in traditional pipelines. This technique integrates seamlessly with hardware rasterization, improves rendering efficiency, avoids popping artifacts, and ensures compatibility across GPU architectures.

Collectively, these contributions make scene reconstruction faster, more robust, and more computationally efficient, advancing the practical deployment of neural rendering in real-world applications.

-

Name: Alan Milligan

Date: Tuesday, November 25th, 2025

Time: 2:00pm to 3:00pm

Location: ICCS 146

Supervisor's name: Mark Schmidt

Title of the thesis: What does the Adam optimizer actually adapt to?

Abstract: As the impact of machine learning on all aspects of daily life grows, the importance of understanding why and how it works follows. Despite the success of recent machine learning based systems, several elements of the general machine learning pipeline remain poorly understand. This lack of understanding is no longer a question strictly for academics. Companies now spend millions of dollars training models, incurring massive energy and carbon costs. Without a solid understanding of the training process, predicting downstream model performance as a function of the choices made before training remains impossible. Many tricks have been discovered to address these challenges, but the tricks themselves remain poorly understood. In particular, training machine learning models is framed as an optimization problem, but optimization theory lacks sufficient tools to analyze modern problems. Optimization theory often fails to explain why problems are hard or why algorithms are effective. A pointed example of this is the Adam optimization algorithm. Adam is widely successful and considered to be the default optimization algorithm in machine learning, but theory predicts it to be no better than classical algorithms like gradient descent. Adam stems from a line of “adaptive optimizers” that in some way adapt to the problem, but what they are adapting to is not clearly defined either. This thesis aims to highlight characteristics of optimization problems that Adam is addressing, and highlight how classical theoretical assumptions fail to explain Adam. We isolate heavy-tailed class imbalance in language modelling as a characteristic that makes gradient descent fail, while Adam is unaffected. Further analysis shows that this characteristic leads to correlations between the gradients and Hessians of the model, a quality theorized to help Adam. Following this, we find that imbalanced features, as seen in a setting using graph neural networks, additionally cause gradient descent to fail while Adam remains effective. Finally, we further challenge existing theory, showing the performance of Adam can be both improved and destroyed by the choice of basis to run the optimization problem in. The majority of existing theory is invariant to the basis being used, and therefore fails to capture Adam’s advantage. 

-