PhD Thesis Defense - Mir Rayat Imtiaz Hossain
Name: Mir Rayat Imtiaz Hossain
Date: Mon, October 6, 2025
Time: 1 pm
Location: ICCS 146
Supervisors: Leonid Sigal and James Little
Title: Learning Efficient Representations for Generalizable Segmentation in Data-Limited Scenarios
Abstract:
Adapting visual recognition models to novel categories with minimal supervision is important, especially for dense prediction tasks like segmentation, where collecting pixel-level annotations is costly or impractical. These challenges are greater in data-limited domains and under distribution shifts, requiring models that can generalize with limited supervision. This thesis focuses on learning efficient representations to enable such generalization. We present three key contributions. First, motivated by how humans use global context in visual reasoning, we propose an object-centric global reasoning module that aggregates semantically similar features into latent tokens representing object instances or parts. These tokens interact through a global reasoning mechanism to enhance contextual understanding without requiring additional instance-level annotations.
Second, we address generalized few-shot segmentation, aiming to adapt to novel classes from few examples while preserving performance on base categories learned with abundant data. We introduce learnable visual prompts and a causal cross-attention mechanism that contextualizes novel class prompts relative to base classes, while a transductive prompt-tuning strategy leverages unlabeled test images to improve test-time performance. Finally, we investigate training-free, open-vocabulary segmentation using vision-language models (VLMs) and propose an unsupervised entropy-based metric, InfoScore, to automatically select the most effective attention layers for segmentation.
Our method generalizes across diverse VLM architectures and shows that minimal visual supervision—such as a single example per category—can significantly improve segmentation by reducing the semantic gap between class names and visual representations. Together, these contributions advance segmentation techniques toward scalable, adaptable solutions for data-limited and low-supervision settings.