Finding local RNA motifs using covariance models

ID
TR-2006-06
Authors
Sohrab P. Shah and Anne E. Condon
Publishing date
April 03, 2006
Length
29 pages
Abstract
We present DISCO, an algorithm to detect conserved motifs in sets of unaligned RNA sequences. Our algorithm uses covariance models (CM) to represent motifs. We introduce a novel approach to initialise a CM using pairwise and multiple sequence alignment. The CM is then iteratively refined. We tested our algorithm on 26 data sets derived from Rfam seed alignments of microRNA (miRNA) precursors and conserved elements in the untranslated regions of mRNAs (UTR elements). Our algorithm outperformed RNAProfile and FOLDALIGN in measures of sensitivity and positive predictive value, although the running time of RNAProfile was considerably faster. The accuracy of our algorithm was unaffected by properties of the input data and performed consistently under different settings of key parameters. The running time of DISCO is O(N²L²W² + NL³) where W is the approximate width of the motif, L is the length of the longest sequence in the input data, and N is the number of sequences. Supplemental material is available at: http://www.cs.ubc.ca/\~sshah/disco.