PhD Thesis Proposal Exam - Raymond Li

Date

Name: Raymond Li

Date: Monday, December 9

Time: 2 pm - 3 pm

Location: ICCS 104

Supervisors: Giuseppe Carenini  and  Gabriel Murray

Title: Exploring the Interplay Between Interpretability and Performance for Language Models

Abstract:
While predictive accuracy has traditionally been the primary metric for evaluating language models, their interpretability are equally essential for fostering transparency, accountability, and trust. This is especially important in sensitive domains, where model decisions can have significant consequences. In this thesis proposal, we explore the interplay between performance and interpretability in language models by systematically address the challenges of applying interpretability methods under the domain of three core language constructs (linguistic structures, factual knowledge, and latent topics) that are pivotal for natural language understanding and generation.

Linguistic structures refer to the hierarchical arrangement of language.

Here, we separately address pre-defined explicit structures, and implicit structures learned autonomously by the model. Specifically, we explore techniques for integrating explicit structures using interpretability methods, and explore strategies to effectively learn implicit structures while assessing their alignment with known linguistic frameworks.

Factual knowledge refers to a language model's ability to retain objective information acquired during large-scale pre-training. While accuracy is a crucial performance metric, the reliability of factual knowledge predictions is equally important. Here, we focus on adapting interpretability methods to quantify factual knowledge uncertainty to better assess the reliability of factual knowledge predictions.

Lastly, latent topics, which represent broad and abstract concepts, has traditionally been used for analyzing the underlying themes in a text corpus. Instead, we use topics as interpretable features by inferring the latent topics during the generative process of language model inference to understand model behavior as well as using them to retrieve demonstration examples to improve the task accuracy.

In this proposal, by formulating research questions under the domain of each core language construct, we seek to explore the interplay between model interpretability and performance by adapting interpretability methods to improve the overall performance and/or transparency of the model based on gaps in existing research.