Lecture on RNA secondary structure prediction
Module 5, Part 2
[by Mirela Andronescu]
--------------
5.4 Free energy of RNA secondary structure
Thermodynamic free energy determines stability and likelihood of secondary structure.
How to estimate free energy for a given secondary structure S of a strand s?
Simplest idea: weighted sum of GC and AU pairs:
- can capture different energy contributions for GC, AU pairs
- basic assumption: energy contributions by individual pairs are independent and additive
- problem: does not capture major source of stabilising energy contributions through stacking interactions between base pairs (and pairs + neighbouring single bases)
- problem: does not capture destabilising contributions of loops (these can be very significant)
Example: (from slide)
17*GC + 2*AU
Nearest neighbour model - basic idea:
- captures stacking interactions between neighbouring base pairs
- basic assumption: energy contributions by neighbouring base pairs are independent and additive
Example: (from slide)
GC,CG + CG,CG + CG,GC + GC,CG + ... + CG,UA + UA,GC
Turner energy parameters
- capture energy contributions of stacking interactions and various types of loops in the form of a large number of experimentally determined parameters
- does not model pseudoknots
Details of the Turner Group Energy model:
(see http://www.bioinfo.rpi.edu/~zukerm/rna/energy/node2.html)
- Stacking Energies:
5'-WX-3'
3'-ZY-5'
(specified as tables for WXYZ combinations, including terminal mismatched pairs;
additional tables for single base stacking)
- Hairpin Loop Energies:
(closed by one pair)
entropic term, depending on loop size
+ hairpin loop terminal stacking energy (depending on closing pair and following mismatched pair)
+ special bonus energy (e.g., for tetra loops = size 4)
(also specified in the form of tables for the various cases)
- Interior and Bulge Loop Energies:
(closed by two pairs)
entropic term, depending on loop size
+ hairpin loop terminal stacking energies
(depending on both closing pairs and following mismatched pairs)
+ asymmetry penalty (for asymmetric loops)
Note: special rules for small interior loops 2sym, 3asym, 4sym
- Multi-branched Loops:
(closed by > 2 pairs - not well understood)
E = a + n1 ? b + n2 ? c
where n1=number of unpaired bases, n2=number of branches
(a,b,c) = (4.6, 0.4, 0.1)
a= offset (empirically derived const)
b= base penalty (empirically derived const)
c= helix penalty (empirically derived const)
Note: the form of this equation is not motivated by modelling accuracy, but in order to facilitate efficient prediction!
+ energy contributions through single base stacking
(interaction between terminal pair of stem and adjacent unpaired base)
(There is a more accurate energy calculation for multi-loops - Jacobson-Stockmeyer theory - but this is typically not used in secondary structure prediction)
- Free bases (exterior loop):
energy contributions through single base stacking
(additional energy contributions through coaxial stacking of adjacent helices, typically not considerd in predictions)
Turner's free energy set contains 12503 parameters, 7589 unique parameters.
Note: free energy is temperature dependent and typically specified for 37 deg C
Example: (from slide)
\Delta G(S) = \Delta G($H1$) + \Delta G($S1$) + \Delta G($B$)
+ \Delta G($S2$) + \Delta G($E$) + \Delta G($S3$)
+ ... + \Delta G($S6$) + \Delta G($H3$)
= 4.1 - 3.4 + 3.1 - 3.5 - 3.1 - 11.7 + ... - 5.8 + 4.1 = -22.5
(very similar considerations apply to DNA secondary structure as well as to base pairing between multiple strands of DNA or RNA or hybrids of DNA and RNA)
Efficient (linear-time) computation of turner free energy
-> course project by Rastegari et al.
--
5.5 The dynamic programming algorithm by Z&S
[Slide with the possible structures]
[Slide with the basic idea]
The algorithm is clearly explained in:
Lecture notes from CSE 527, Winter 2000 (Martin Tompa) at U Washington,
http://www.cs.washington.edu/education/courses/527/00wi/lectures/lect16.pdf
http://www.cs.washington.edu/education/courses/527/00wi/lectures/lect17.pdf
[Again the slide with the basic idea, showing the example]
Implementations: mfold (Michael Zuker web page) and Vienna RNA package
Accuracy: pretty good for short sequences (< 100nt), not so good for long
2-4 rule: doesn't do it, especially for long sequences
If pseudoknotted, it becomes NP-hard
--
[5.6 and 5.7 were only covered very briefly in class]
5.6 Dynamic programming algorithm for RNA duplexes
[Slide with the additional special cases of structures]
- give the basic idea: very similar to Z&S
--
5.7 RNA structure design (here, just talked about the problem)
- explain idea of SLS
- what is a compatible sequence
- it appears to be hard, but noone knows the complexity
- general idea of SLS on sequences
- other approach: Vienna inverse folder
- [Structure slide] and explain the splitting
Algorithm: initialization, splitting, SLS on substructures
- merge substructures together: does not necessarily fold into the desired structure
- percentage of CG in pairs and in non-paired bases
- show the slide again and show the percentage of GC pairs
- software accessible online at www.RNAsoft.ca
-----------
Resources:
Durbin et al., Ch.10
Lecture notes from CSE 527, Winter 2000 (Martin Tompa) at U Washington,
http://www.cs.washington.edu/education/courses/527/00wi/lectures/lect16.pdf
http://www.bioinfo.rpi.edu/~zukerm/seqanal-old/node1.html#SECTION00010000000000000000
Further Reading:
http://www.bioinfo.rpi.edu/~zukerm/rna/energy/
http://www.bioinfo.rpi.edu/~zukerm/rna/energy/node2.html
-> turner free energy model