Bioinformatics reading group

Title:	Sequencing with short reads: Biological applications and computational challenges
Speaker:	Ryan Morin, Genome Sciences Centre

Abstract	Three next-generation sequencing technologies have appeared in the past few years, all producing reads shorter than those produced by classic sequencing methods. 454, the first system made available, routinely produces reads ~ 2-300 nucleotides (nt) long with high accuracy (~400k reads/run). The latter two, Solexa (Illumina) and SOLiD (ABI), produce reads between 25 and 36 nt in much larger quantities (10-150M reads/run). The SOLiD platform also offers ~mate pairs~, or 26 nt reads from both ends of each fragment. The Solexa system is being routinely used at the BC Genome Sciences Centre for asking multiple diverse biological questions. Short reads are perfectly suited for interrogating digital gene expression libraries such as those produced by serial analysis of gene expression (SAGE) and microRNA capturing techniques. The sequences produced by such experiments are associated to known transcripts or microRNA genes and are used as a metric for gene expression. These data sets also provide a resource for novel transcript and microRNA discovery. Various chromatin immunoprecipitation (ChIP) approaches adapted to Solexa sequencing now allow researchers to produce high-resolution maps of genomic regions that associate with various proteins. Short sequences can also be used to profile the whole transcriptome of a cell in a ~shotgun~ style approach, yielding much more information about the variability at the exon and splicing level. Larger structures can also be ~re-sequenced~ by shotgun sequencing, including bacterial artificial chromosomes (BACs), containing large genomic fragments, whole chromosomes, or potentially the entire genome. These experiments are able to validate genomic variability such as deletions, single nucleotide polymorphisms and potentially translocations. However, due to the overwhelming content of simple and complex repeats in mammalian genomes, sequences do not always map uniquely to the genome, providing challenges in read-to-genome mapping and de-novo assembly.