MSc Thesis Presentation - Kaiyun Guo
Name:  Kaiyun Guo
Date:   Wednesday, August 28th
Time: 3:00 pm
Location:   ICCS 202
Supervisor:  Jiarui Ding
Title:  ProtoCloud: A Generative Prototypical Self-explanatory Model for Cell Type Annotation
Abstract: 
Single-cell RNA sequencing has fundamentally transformed our understanding of cellular diversity within tissues. The complexity of scRNA-seq data presents several challenges, including complex batch effects, varying sequencing depths, and ambient RNA contamination. Effective data analysis requires robust computational tools, with cell type annotation being a critical component. As the scale of the scRNA-seq dataset grows, several deep-learning models have been developed to offer automated and accurate cell type annotations. However, the lack of interpretability and explainability in these black-box models makes it difficult to understand the reasoning behind their decision-making processes and introduces concerns regarding the reliability of the results.
To address this drawback, we propose ProtoCloud, a generative self-explaining model based on prototypical variational autoencoders. The model simultaneously learns cell-type representational prototypes, ensuring interpretability throughout the training process. Additionally, it quantifies the certainty of each prediction by calculating a similarity score between the cell embedding and the prototypes. The classification results demonstrate that ProtoCloud outperforms non-interpretable state-of-the-art methods in cell type annotation, as evidenced by the accuracy, macro F1 score, and Cohen's kappa score on seven in vivo datasets. To facilitate the explainability of ProtoCloud, this thesis extends layerwise relevance propagation (LRP) to encompass the case of linear layers. By applying LRP on prototypes, ProtoCloud not only underscores known marker genes but also identifies potentially relevant genes and transcriptional signatures. The model has been tested under a variety of experimental conditions in different scenarios, including datasets sourced from various technologies and organs. Its performance has been proven to be robust and consistent, providing accurate and reliable explanations. Finally, ProtoCloud was applied to transfer cell type annotations in two disease contexts: 1) across seven time points following optic nerve crush on mouse retinal ganglion cells, and 2) in human eosinophilic esophagitis, a chronic, progressive, allergy-mediated inflammatory disease of the esophagus across three disease condition from three studies. These applications demonstrate ProtoCloud's broad potential for future use in various research and clinical settings.