Bayesian analysis of clinical array CGH profiles with robust HMMs

By Sohrab Shah


Array comparative genomic hybridization (aCGH) is a pervasive, genome-wide technique used to identify chromosomal aberrations in human diseases, including cancer. Aberrations are defined as regions of increased or decreased DNA copy number, relative to a normal sample. Accurately identifying the locations of these aberrations has applications in the understanding molecular mechanisms of disease as well as the development of diagnostic and prognostic tools. Unfortunately, the observed copy number changes are often corrupted by various sources of noise, making the boundaries hard to detect. One popular current technique uses hidden Markov models (HMMs) to divide the signal into regions of constant copy number called segments; a subsequent classification phase labels each segment as a gain, a loss or neutral. Unfortunately, standard HMMs are sensitive to outliers, causing over-segmentation, where segments erroneously span very short regions. Furthermore, the signals from clinical samples are complicated by ploidy of the tumour cells and normal/tumour admixture from the sample preparation. Consequently, parameter settings for the HMM that generalise well across a cohort of patients are difficult to find. Finally, the signals also contain benign copy number changes that are present in normal individuals as part of copy number variation (CNV) in the human population. Our work focuses on the development of HMMs that simultaneously segment and classify, are robust to outliers, can automatically adapt to variability in clinical samples and can leverage prior knowledge about CNVs directly into the analysis. We systematically show how each of these contributions offers significant improvement over state of the art methods on clinical samples with ground truth knowledge about the locations of aberrations. Accuracy of our methods is measured over 29 enteropathy-type T-cell lymphoma and 11 blastic mantle cell lymphoma patients with aberrations determined manually by experts.

This is joint work with Kevin Murphy, Raymond Ng, Wan Lam, Ron deLeeuw

Click here to go to the LCI Forum page