PhD Thesis Proposal - Wanxin Li
Name: Wanxin Li
Date: Nov 26, 2024,
Time: 2 pm - 4 pm
Location: ICICS 146
Supervisors: Anne Condon, Khanh Dao Duc
Title: Computational Methods for Addressing Bias and Fairness and Analyzing Cell Shape Heterogeneity
Abstract:
This thesis proposal focuses on two problems arising from biomedical data: (i) Addressing bias and fairness in healthcare systems and (ii) Improving metric and dimensionality reduction in cell shape heterogeneity analysis.
Part (i): As healthcare systems adopt Electronic Health Records (EHRs), more opportunities arise for algorithmic applications. However, two challenges arise: First, EHRs from different populations can introduce biases, making data and models less transferable. To address this, we developed a method leveraging optimal transport (OT) to transfer knowledge between populations with guarantees for suitability and enabling the quantification of treatment disparities. Second, algorithms using EHRs may propagate unfairness. While fairness testing methods have been proposed for binary classification, they often lack computational tractability. We aim to propose a framework based on OT projections for testing the fairness of regression problems under a wide range of fairness criteria while maintaining computational tractability. We will apply the framework to test the fairness of critical algorithms in healthcare, such as emergency room wait time prediction.
Part (ii): Cells grown on planar surfaces show diverse morphological shapes due to genetic or environmental factors. We explored how alternative metrics, beyond the Euclidean metric, can offer insights into cell shape heterogeneity. The Square Root Velocity (SRV) metric, a specific instance of the elastic metric, is known for its computational efficiency in practice. While current studies restricted the use of the SRV metric to either simple shapes or basic tasks, we extensively explored SRV's power in comparing its distances to the mean shape and using it as the metric for multi-dimensional scaling (MDS). Our study showed the superior performance of SRV against the linear metric on datasets of human cancer cells. Additionally, the presence of orthogonal outliers can significantly distort results from both Euclidean and non-Euclidean metrics. To address this, we developed a method that detects and corrects orthogonal outliers in MDS while estimating the dataset's dimensionality. We validated the effectiveness of this method on single-cell images, microbiome sequencing data, and scRNA-seq data.