*****************************************************

Lectured by Dr. Anne Condon, Notes taken by Kevin Chen

*****************************************************

**1. **Motivation:

· RNA plays regulatory, catalytic and structural role in cells

· Used in phylogenetic tree reconstruction

**2.** To understand
structure/function of the RNA molecule, we have two goals :

(1) To align RNA sequence

(2) To determine secondary structure of RNA sequence.

· Primary structure : base sequence ( eg. 5'-AUCGUAA......CGU-3')

· Secondary structure : the "base pair" (eg. C-G , A-U ) structure
that largely determine the 3D (tertiary) structure of the molecule.

· Example :

Given RNA : 5' - AUCCAAAGGAU - 3'

Denoted by : 5' - S1S2S3.................Sn - 3'

The secondary structure is a set S of base pair ( i , j ) , 1 < i < j
< n

If given :

then we know S = { (1,11) , (2,10) , (3,9) , (4,5) }

if given structure
as follows :

Then S = { (4,8) , (7,11) }

Note : In this case, S = { (i,j), (i',j') }. Notice that** i < i' < j
< j' **. This is called a **"pseudoknot"**

**3.** Goals
(1) and (2) are interrelated :

· Base pairing interaction in an RNA molecule causes long-range dependencies
between nucleotides in the molecule. This implies that additive scoring system
used in pair-wise alignment doesn't work well for RNA

· However, if we do have information on the secondary structure then
we have information of where the base pairs occur and this will help us do the
alignment of the RNA sequences.

· That is, alignment methods that take into account the secondary structure
are preferred.

· Conversely, RNA sequence alignment can be used to help determine RNA
secondary structure. This is called " **Comparative Analysis**"
.

· **Comparative analysis :** ( refer to Durbin's
BSA Chap 10 )

· Given several closely related sequences, iterate the following 2 steps
:

1. Align the sequences ( based on new guess at the structure)

2. Guessing what are the base pairs in the structure (based on the current best
guess of the alignment).

· To accomplish step 2, we need to analyse the "**Mutual Information"**
called **M**ij between 2 aligned columns i and j .

· Definition of **M**ij :

· How to calculate f xixj (frequency of pair ij in column i and j ) ?

For example :

Given

-- i ------ j --

--A------U--

--A------C--

--A------U--

--C------G--

FAU = 2/4

FCG = 1/4

FCA = 0/4 ( order matters )

· Intuitively, Mij is the amount information ( in bits) revealed about
the position j

if you are told what's in position i .

· The following
example shows a "completely correlated" base pairs :

i - j

A-U

U-A

C-G

G-C

fAU = fUA = fCG = fGC = 1/4

Mij = 4 * ( 1/4 * log2 * (1/4)/ ( (1/4)(1/4) ) ) + 12 * (0 ) = 2

( note : there
are total of 16 different (i,j) for i, j = { A,U,G,C } , only 4 base pairs present
in this example, and the rest 12 get score of zero )

**RNA secondary
structure prediction **(
for single strand RNA sequence )

· Use measure
of stability of secondary structure associated with given RNA strand, that is,
the predicted free energy of the structure.

· Loops tend to be de-stabilizing", and it contributes to +ve free
energy.

· Assuming no pseudoknot, free energy is the sum of free energies of
individual loops and stacked pairs.

· This turns our problem into "finding the secondary structure with
minimum free energy, taken over all secondary structures for the input molecule"

· For details of the 4 free energy functions, please see notes from Lecture
16 of the Computational Biology course at U of W :

http://www.cs.washington.edu/education/courses/527/00wi/

Here's a brief summary of what Anne mentioned in the class :

1. eS(i,j) : free energy of stacked loop , depending on Si, Sj, Si+1 , Sj-1
.

2. eH(i,j) : free energy of hairpin loop closed by (i, j) , depending on Si,
Sj,

j-i (length), Si+1 and Sj-1.

3. eL(i,j,i',j'): free energy of internal loop , depending on Si, Sj, Si', Sj'
, Si+1, Sj-1

Si'-1, Sj'+1, i-i' and j-j' .

4. eM( i , j , .......... ik , jk ) : free energy of multibranched loop closed
by (i,j). This one is not well understood.

· **Dynamic programming** **approach** of finding optimal secondary
structure was briefly mentioned in the class. Anne suggested to see the details
in the U of W lecture notes
http://www.cs.washington.edu/education/courses/527/00wi/(lecture 16)

Here's what Anne
mentioned in the class :

Let W(j) be the free energy of the optimal secondary structure associated with

S1S2.........................Sj

There's 2 possibility for Sj

(1) If Sj is not paired, W(j) = W(j-1), since unpaired base does NOT contribute
to overall free energy.

(2) If Sj is paired in optimal secondary structure, say to Si where i<j ,
then W(j) = W(i-1) + V(i,j ) , where V(i,j) is the free energy of the optimal
structure of Si...Sj, assuming i,j forms a base pair in the structure.

**·
Reference for this lecture :**

(1) Durbin's Biological Sequence Analysis, Chap 10.

(2) Lecture 16 notes from U of W CSE 527 class : http://www.cs.washington.edu/education/courses/527/00wi/