PhD thesis defense - Wonho Bae

Date

July 7, 2025 1:00 PM –4:00 PM

Name: Wonho Bae

Date: July 7

Time: 1:00 pm

Location: ICICS 146

Supervisor: Prof. Danica Sutherland

Thesis Title: Budget-Robust Active Learning

Abstract:
Deep learning has made significant strides in recent years, largely due to the availability of vast amounts of labeled data. However, expensive and time consuming manual annotation limits the widespread adoption of Artificial Intelligence (AI), particularly for smaller organizations and individuals. This highlights the need for data-efficient AI frameworks that reduce dependence on large-labeled datasets, making AI more accessible. Active learning, where a model strategically selects the most informative data points for annotation, offers a promising solution to this challenge. It improves model performance with fewer labeled examples, making it especially valuable in domains where labeling is costly. Recent research has revealed that the effectiveness of active learning methods varies significantly across different budget regimes, where the budget is defined by the size of a labeled set. In particular, uncertainty-based methods, which perform well in high-budget settings, often underperform compared to representation-based methods or even random sampling in low-budget regimes. In this thesis, we investigate how to improve active learning under both high- and low-budget regimes. We begin with the high-budget setting, where we introduce a novel uncertainty-based method that leverages neural tangent kernels (NTKs) to make computation of look-ahead acquisition strategies feasible. This approach allows the model to account for the changes of “future” predictions, resulting in strong performance across various datasets, particularly in the high-budget regimes. In the low-budget regimes, we propose MaxHerding, a representation-based method that generalizes the recently introduced ProbCover and establishes connections to other low-budget active learning techniques. To further explore active learning under limited annotation budgets, we consider its application to meta-learning (or few-shot learning) and develop a simple yet effective acquisition strategy based on Gaussian Mixture Models (GMMs), motivated by a max-margin classifier. Given the difficulty of determining the appropriate budget regime in advance, we finally propose Uncertainty Herding (UHerding), a budget-robust active learning method that adaptively interpolates between uncertainty and representation-based strategies. Our empirical results show that UHerding consistently outperforms existing methods across a wide range of budget regimes, offering a promising direction toward hyperparameter-free and more robust active learning in real-world applications.