|Title:||Enhanced RNA secondary structure prediction methods|
CIHR/MSFHR bioinformatics training program, University of British Columbia
Ribonucleic Acid (RNA)s play an important role in functioning of the cell. The function of RNA molecules (as any other molecule) are determined by their three dimensional structure, usually driven from X-ray crystallography or Nuclear Magnetic Resonance (NMR). Since both of these processes are too expensive, there is a large demand for accurate computational methods that determine the structure, given the sequence. Unfortunately, current databases of identiﬁed RNA molecules (using X-ray or NMR) are too small so, comparative methods cannot perform well on recently discovered RNA molecules since they need a high sequence identity. In absence of large databases, an alternative is to ﬁnd the Minimum Free Energy (MFE) structure. Free energy of RNA molecules have been studied for a long time and currently there are some models that can estimate the free energy of a structure (for example Turners model and INN-HB model). The general idea is that structures with lower levels of energy are more stable and their chance for being the native structure is higher. However, there are structures that their native form is far from being MFE. Several different algorithms for MFE secondary structure prediction have been designed and the most famous ones are Dynamic Programming (DP) approaches like mfold (Zuker 2003) as well as an stochastic context free grammar (SCFG) based method called Contrafold (Do et al. 2006). In addition, new parameter sets for the Turner’s model are published (Andronescu et al. 2007) which outperform the original parameters. Here, the goal is to combine the predictions obtained by various state-of-the-art RNA secondary structure prediction algorithms, in order to obtain overall more accurate predictions across a broad range of RNAs. In our approach, given an RNA strand, a number of prediction programs are used to generate secondary structure predictions, which are then combined on a per-base-pair-basis, using weighted sums in conjunction with parameter optimization tech- niques. Results are assessed using the recently published RNA STRAND secondary structure database(Andronescu et al. 2008).