Bayesian RNA secondary structure prediction and sampling

Title:	Bayesian RNA secondary structure prediction and sampling
Speaker:	Mirela Andronescu

Abstracts	[1] A Bayesian statistical algorithm for RNA secondary structure prediction Ye Ding and Charles E. Lawrence Computers & Chemistry Volume 23, Issues 3-4 , 15 June 1999, Pages 387-400 A Bayesian approach for predicting RNA secondary structure that addresses the following three open issues is described: (1) the need for a representation of the full ensemble of probable structures; (2) the need to specify a fixed set of energy parameters; (3) the desire to make statistical inferences on all variables in the problem. It has recently been shown that Bayesian inference can be employed to relax or eliminate the need to specify the parameters of bioinformatics recursive algorithms and to give a statistical representation of the full ensemble of probable solutions with the incorporation of uncertainty in parameter values. In this paper, we make an initial exploration of these potential advantages of the Bayesian approach. We present a Bayesian algorithm that is based on stacking energy rules but relaxes the need to specify the parameters. The algorithm returns the exact posterior distribution of the number of destabilizing loops, stacking energy matrices, and secondary structures. The algorithm generates statistically representative structures from the full ensemble of probable secondary structures in exact proportion to the posterior probabilities. Once the forward recursions for the algorithm are completed, the backward recursive sampling executes in O(n) time, providing a very efficient approach for generating representative structures. We demonstrate the utility of the Bayesian approach with several tRNA sequences. The potential of the approach for predicting RNA secondary structures and presenting alternative structures is illustrated with applications to the Escherichia coli tRNAAla sequence and the Xenopus laevis oocyte 5S rRNA sequence.
	[2] A statistical sampling algorithm for RNA secondary structure prediction Ye Ding and Charles E. Lawrence Nucleic Acids Research, 2003, Vol. 31, No. 24 7280-7301 An RNA molecule, particularly a long-chain mRNA, may exist as a population of structures. Further more, multiple structures have been demonstrated to play important functional roles. Thus, a representation of the ensemble of probable structures is of interest. We present a statistical algorithm to sample rigorously and exactly from the Boltzmann ensemble of secondary structures. The forward step of the algorithm computes the equilibrium partition functions of RNA secondary structures with recent thermodynamic parameters. Using conditional probabilities computed with the partition functions in a recursive sampling process, the backward step of the algorithm quickly generates a statistically representative sample of structures. With cubic run time for the forward step, quadratic run time in the worst case for the sampling step, and quadratic storage, the algorithm is efficient for broad applicability. We demonstrate that, by classifying sampled structures, the algorithm enables a statistical delineation and representation of the Boltzmann ensemble. Applications of the algorithm show that alternative biological structures are revealed through sampling. Statistical sampling provides a means to estimate the probability of any structural motif, with or without constraints. For example, the algorithm enables probability profiling of single-stranded regions in RNA secondary structure. Probability profiling for specific loop types is also illustrated. By overlaying probability profiles, a mutual accessibility plot can be displayed for predicting RNA:RNA interactions. Boltzmann probability-weighted density of states and free energy distributions of sampled structures can be readily computed. We show that a sample of moderate size from the ensemble of an enormous number of possible structures is sufficient to guarantee statistical reproducibility in the estimates of typical sampling statistics. Our applications suggest that the sampling algorithm may be well suited to prediction of mRNA structure and target accessibility. The algorithm is applicable to the rational design of small interfering RNAs (siRNAs), antisense oligonucleotides, and trans-cleaving ribozymes in gene knock-down studies.