|Title:||Statistical models for genomic array data analysis|
DNA copy number alterations (CNAs) are a hallmark of somatic mutations
in tumor genomes and congenital abnormalities that lead to diseases such
as mental retardation. CNAs define regions on a given chromosome that
exhibit a deletion or amplification of the DNA within the region.
Accurately identifying the locations of CNAs in an individual sample has
applications in the understanding molecular mechanisms of disease as
well as the development of diagnostic and prognostic tools. Furthermore,
identifying the pattern of recurrent CNAs that occur in a set of samples
exhibiting a common phenotype has compelling implications for medical
advances. Recent progress in array comparative genomic hybridization
(aCGH) have enabled researchers to measure CNAs at high resolution for
the entire human genome. Unfortunately, the observed copy number changes
are often corrupted by various sources of noise, making the CNAs hard to
In this talk I will explore model-based approaches to the detection of CNAs in aCGH data. I will describe four main areas of research: CNA detection given a sample from one individual; joint analysis of aCGH data from a set of samples to detect recurrent CNAs; unsupervised clustering of aCGH data; and integration of aCGH data with methylation arrays-a promising new technique for detecting so called epigenomic changes. I will systematically describe how novel extensions to HMMs applied to the first two of these research goals leads to improved results over baseline models on both cell line and clinical data. Furthermore, I will show how work to date provides a robust statistical framework upon which to develop our novel ideas for the latter two research goals.