Difference: AbstractsJan2006 (1 vs. 5)

Revision 52006-01-23 - baharak

Line: 1 to 1
 
META TOPICPARENT name="BioinformaticsReadingGroup"
Deleted:
<
<
 VoteAbstractsJan2006
Line: 99 to 98
  Microarrays represent a powerful technology that provides the ability to simultaneously measure the expression of thousands of genes. However, it is a multi-step process with numerous potential sources of variation that can compromise data analysis and interpretation if left uncontrolled, necessitating the development of quality control protocols to ensure assay consistency and high-quality data. In response to emerging standards, such as the minimum information about a microarray experiment standard, tools are required to ascertain the quality and reproducibility of results within and across studies. To this end, an intralaboratory quality control protocol for two color, spotted microarrays was developed using cDNA microarrays from in vivo and in vitro dose-response and time-course studies. The protocol combines: (i) diagnostic plots monitoring the degree of feature saturation, global feature and background intensities, and feature misalignments with (ii) plots monitoring the intensity distributions within arrays with (iii) a support vector machine (SVM) model. The protocol is applicable to any laboratory with sufficient datasets to establish historical high- and low-quality data.
Added:
>
>

Bioinformatics (selected by Baharak)

[B1] Sequence-based heuristics for faster annotation of non-coding RNA families, Zasha Weinberg ,and Walter L. Ruzzo.

Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be.

In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that—unlike family-specific solutions—can scale to hundreds of ncRNA families.

[B2] QUASAR—scoring and ranking of sequence–structure alignments , Fabian Birzele , Jan E. Gewehr and Ralf Zimmer .

Sequence–structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence–structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence–structure alignments ranking) provides a unifying framework for scoring sequence–structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against ‘standard-of-truth’ structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

[B3] Discovering hidden viral piracy , Eddo Kim, and Yossef Kliger.

Viruses and developers of anti-inflammatory therapies share a common interest in proteins that manipulate the immune response. Large double-stranded DNA viruses acquire host proteins to evade host defense mechanisms. Hence, viral pirated proteins may have a therapeutic potential. Although dozens of viral piracy events have already been identified, we hypothesized that sequence divergence impedes the discovery of many others.

We developed a method to assess the number of viral/human homologs and discovered that at least 917 highly diverged homologs are hidden in low-similarity alignment hits that are usually ignored. However, these low-similarity homologs are masked by many false alignment hits. We therefore applied a filtering method to increase the proportion of viral/human homologous proteins. The homologous proteins we found may facilitate functional annotation of viral and human proteins. Furthermore, some of these proteins play a key role in immune modulation and are therefore therapeutic protein candidates.

[B4] Analysis of differentially-regulated genes within a regulatory network by GPS genome navigation , Igor Zwir , Henry Huang and Eduardo A. Groisman.

A critical challenge of the post-genomic era is to understand how genes are differentially regulated even when they belong to a given network. Because the fundamental mechanism controlling gene expression operates at the level of transcription initiation, computational techniques have been developed that identify cis regulatory features and map such features into expression patterns to classify genes into distinct networks. However, these methods are not focused on distinguishing between differentially regulated genes within a given network. Here we describe an unsupervised machine learning method, termed GPS for gene promoter scan, that discriminates among co-regulated promoters by simultaneously considering both cis-acting regulatory features and gene expression. GPS is particularly useful for knowledge discovery in environments with reduced datasets and high levels of uncertainty.

Application of this method to the enteric bacteria Escherichia coli and Salmonella enterica uncovered novel members, as well as regulatory interactions in the regulon controlled by the PhoP protein that were not discovered using previous approaches. The predictions made by GPS were experimentally validated to establish that the PhoP protein uses multiple mechanisms to control gene transcription, and is a central element in a highly connected network.

[B5] Using information theory to search for co-evolving residues in proteins , L. C. Martin , G. B. Gloor , S. D. Dunn and L. M. Wahl.

Some functionally important protein residues are easily detected since they correspond to conserved columns in a multiple sequence alignment (MSA). However important residues may also mutate, with compensatory mutations occurring elsewhere in the protein, which serve to preserve or restore functionality. It is difficult to distinguish these co-evolving sites from other non-conserved sites.

Results: We used Mutual Information (MI) to identify co-evolving positions. Using in silico evolved MSAs, we examined the effects of the number of sequences, the size of amino acid alphabet and the mutation rate on two sources of background MI: finite sample size effects and phylogenetic influence. We then assessed the performance of various normalizations of MI in enhancing detection of co-evolving positions and found that normalization by the pair entropy was optimal. Real protein alignments were analyzed and co-evolving isolated pairs were often found to be in contact with each other.

Revision 42006-01-23 - MirelaAndronescu

Line: 1 to 1
 
META TOPICPARENT name="BioinformaticsReadingGroup"
Added:
>
>
VoteAbstractsJan2006

RNA Journal (suggested by Holger)

[H1] Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories, SANDRA SMIT, MICHAEL YARUS and ROB KNIGHT

We have encountered an unexpected property of rRNA secondary structures that may generalize to all RNAs. Analysis of 8892 ribosomal RNA sequences and structures from a wide range of species revealed unexpected universal compositional trends. First, different categories of rRNA secondary structure (stems, loops, bulges, and junctions) have distinct, characteristic base compositions. Second, the observed patterns of variation are similar among sequences from large and small rRNA subunits and all domains of life, despite extensive evolutionary divergence. Surprisingly, these differences do not seem to be related to selection for different compositions in different structural categories, but rather relate to the overall composition of the molecule: Randomized RNAs with no evolutionary history show the same structure-dependent compositional biases as rRNAs. These compositional trends may improve the accuracy of RNA secondary structure prediction, because they allow us to compare predicted structures against known compositional preferences. They also suggest caution in interpreting differences in the rate of change of the GC content in different parts of the molecule as evidence of differential selection.

[H2] Topology of three-way junctions in folded RNAs, AURÉLIE LESCOUTE and ERIC WESTHOF

The three-way junctions contained in X-ray structures of folded RNAs have been compiled and analyzed. Three-way junctions with two helices approximately coaxially stacked can be divided into three main families depending on the relative lengths of the segments linking the three Watson-Crick helices. Each family has topological characteristics with some conservation in the non-Watson-Crick pairs within the linking segments as well as in the types of contacts between the segments and the helices. The most populated family presents tertiary interactions between two helices as well as extensive shallow/minor groove contacts between a linking segment and the third helix. On the basis of the lengths of the linking segments, some guidelines could be deduced for choosing a topology for a three-way junction on the basis of a secondary structure. Examples and prediction based on those rules are discussed.

Other (suggested by Holger)

[H3] Protein similarity search under mRNA structural constraints: application to selenocysteine incorporation, Rolf Backofen, N. S. Narayanaswamy and Firas Swidan

Selenocysteine is the 21th amino acid, which occurs in all kingdoms of life. Selenocysteine is encoded by the STOP-codon UGA. For its insertion, it requires a specific mRNA sequence downstream the UGA-codon that forms a hairpin like structure (called Sec insertion sequence (SECIS)). We consider the computational problem of generating new amino acid sequences containing selenocysteine. This requires to find an mRNA sequence that is similar to the SECIS-consensus, is able to form the secondary structure required for selenocysteine insertion, and whose translation is maximally similar to the original amino acid sequence. We show that the problem can be solved in linear time when the structure does not contain pseudoknots.

 

BMC Bioinformatics (selected by Mirela)

Changed:
<
<
[M1] An Approach for Clustering Gene Expression Data with Error Information, Brian Tjaden
>
>
[M1] An Approach for Clustering Gene Expression Data with Error Information, Brian Tjaden
  Background. Clustering of gene expression patterns is a well-studied technique for elucidating trends across large numbers of transcripts and for identifying likely co-regulated genes. Even the best clustering methods, however, are unlikely to provide meaningful results if too much of the data is unreliable. With the maturation of microarray technology, a wealth of research on statistical analysis of gene expression data has encouraged researchers to consider error and uncertainty in their microarray experiments, so that experiments are being performed increasingly with repeat spots per gene per chip and with repeat experiments. One of the challenges is to incorporate the measurement error information into downstream analyses of gene expression data, such as traditional clustering techniques.
Line: 11 to 34
  Conclusions. The additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.
Added:
>
>
[M2] In silico discovery of human natural antisense transcripts, Yuan-Yuan Li , Lei Qin , Zong-Ming Guo , Lei Liu , Hao Xu , Pei Hao , Jiong Su , Yixiang Shi , Wei-Zhong He and Yi-Xue Li

Background. Several high-throughput searches for potential NATs were performed recently, but most of the reports were focused on cis type. A thorough in silico analysis of human transcripts will help expand our knowledge of NATs.

Results. We have identified 568 NATs from human RefSeq RNA sequences. Among them, 403 NATs are reported for the first time, and at least 157 novel NATs are trans type. According to the pairing region of a sense and antisense RNA pair, hNATs are divided into 6 classes, of which about 87% involve 5' or 3' UTR sequences, supporting the regulatory role of UTRs. Among a total of 535 NAT pairs related with splice variants, 77.4% (414/535) have their pairing regions affected or completely eliminated by alternative splicing, suggesting significant relationship of alternative splicing and antisense-directed regulation. The extensive occurrence of splice variants in hNATs and other multiple pairing patterns results in one-to-many relationship, allowing the formation of complex regulation networks. Based on microarray data from Stanford Microarray Database, two hNAT pairs were found to display significant inverse expression patterns before and after insulin injection.

Conclusions. NATs might carry out more extensive and complex functions than previously thought. Combined with endogenous micro RNAs, hNATs could be regarded as a special group of transcripts contributing to the complex regulation networks.

[M3] Construction and validation of the APOCHIP, a spotted oligo-microarray for the study of beta-cell apoptosis, Nils E NEM Magnusson , Alessandra K AKC Cardozo , Mogens MK Kruhoffer , Decio L DLE Eizirik , Torben F TFO Orntoft and Jens L JLJ Jensen

Background. Type 1 diabetes mellitus (T1DM) is a autoimmune disease caused by a long-term negative balance between immune-mediated beta-cell damage and beta-cell repair/regeneration. Following immune-mediated damage the beta-cell fate depends on several genes up- or down-regulated in parallel and/or sequentially. Based on the information obtained by the analysis of several microarray experiments of beta-cells exposed to pro-apoptotic conditions (e.g. double stranded RNA (dsRNA) and cytokines), we have developed a spotted rat oligonucleotide microarray, the APOCHIP, containing 60-mer probes for 574 genes selected for the study of beta-cell apoptosis.

Results. The APOCHIP was validated by a combination of approaches. First we performed an internal validation of the spotted probes based on a weighted linear regression model using dilution series experiments. Second we profiled expression measurements in ten dissimilar rat RNA samples for 515 genes that were represented on both the spotted oligonucleotide collection and on the in situ-synthesized 25-mer arrays (Affymetrix GeneChips). Internal validation showed that most of the spotted probes displayed a pattern of reaction close to that predicted by the model. By using simple rules for comparison of data between platforms we found strong correlations (rmedian= 0.84) between relative gene expression measurements made with spotted probes and in situ-synthesized 25-mer probe sets.

Conclusions. In conclusion our data suggest that there is a high reproducibility of the APOCHIP in terms of technical replication and that relative gene expression measurements obtained with the APOCHIP compare well to the Affymetrix GeneChip. The APOCHIP is available to the scientific community and is a useful tool to study the molecular mechanisms regulating beta-cell apoptosis.

 

Current Opinion in Structural Biology (selected by Sanja)

Changed:
<
<
[S1] RNA structure: the long and short of it (review article), Stephen R Holbrook
>
>
[S1] RNA structure: the long and short of it (review article), Stephen R Holbrook
  The database of RNA structure has grown tremendously since the crystal structure analyses of ribosomal subunits in 2000–2001. During the past year, the trend toward determining the structure of large, complex biological RNAs has accelerated, with the analysis of three intact group I introns, A- and B-type ribonuclease P RNAs, a riboswitch–substrate complex and other structures. The growing database of RNA structures, coupled with efforts directed at the standardization of nomenclature and classification of motifs, has resulted in the identification and characterization of numerous RNA secondary and tertiary structure motifs. Because a large proportion of RNA structure can now be shown to be composed of these recurring structural motifs, a view of RNA as a modular structure built from a combination of these building blocks and tertiary linkers is beginning to emerge. At the same time, however, more detailed analysis of water, metal, ligand and protein binding to RNA is revealing the effect of these moieties on folding and structure formation. The balance between the views of RNA structure either as strictly a construct of preformed building blocks linked in a limited number of ways or as a flexible polymer assuming a global fold influenced by its environment will be the focus of current and future RNA structural biology.
Changed:
<
<
[S2]Structure, folding and mechanisms of rybozymes (review article), David MJ Lilley
>
>
[S2] Structure, folding and mechanisms of rybozymes (review article), David MJ Lilley
 

The past two years have seen exciting developments in RNA catalysis. A completely new ribozyme (possibly two) has come along and several new structures have been determined, including three different group I intron species. Although the origins of catalysis remain incompletely understood, a significant convergence of views has happened in the past year, together with the discovery of new super-fast ribozymes. There is persuasive evidence of general acid-base chemistry in nucleolytic ribozymes, whereas catalysis of peptidyl transfer in the ribosome seems to result largely from orientation and proximity effects. Lastly, important new folding-enhancing elements have been discovered.

Science (selected by Sanja)

Changed:
<
<
[S3] The Widespread Impact of Mammalian MicroRNAs on mRNA Repression and Evolution, Farh et al.
>
>
[S3] The Widespread Impact of Mammalian MicroRNAs on mRNA Repression and Evolution, Farh et al.
  Thousands of mammalian messenger RNAs are under selective pressure to maintain 7-nucleotide sites matching microRNAs (miRNAs). We found that these conserved targets are often highly expressed at developmental stages before miRNA expression and that their levels tend to fall as the miRNA that targets them begins to accumulate. Nonconserved sites, which outnumber the conserved sites 10 to 1, also mediate repression. As a consequence, genes preferentially expressed at the same time and place as a miRNA have evolved to selectively avoid sites matching the miRNA. This phenomenon of selective avoidance extends to thousands of genes and enables spatial and temporal specificities of miRNAs to be revealed by finding tissues and developmental stages in which messages with corresponding sites are expressed at lower levels.

PLoS Computational Biology (selected by Sanja)

Changed:
<
<
[S4] New Maximum Likelihood Estimators for Eukaryotic Intron Evolution, Nguyen et al.
>
>
[S4] New Maximum Likelihood Estimators for Eukaryotic Intron Evolution, Nguyen et al.
  The evolution of spliceosomal introns remains poorly understood. Although many approaches have been used to infer intron evolution from the patterns of intron position conservation, the results to date have been contradictory. In this paper, we address the problem using a novel maximum likelihood method, which allows estimation of the frequency of intron insertion target sites, together with the rates of intron gain and loss. We analyzed the pattern of 10,044 introns (7,221 intron positions) in the conserved regions of 684 sets of orthologs from seven eukaryotes. We determined that there is an average of one target site per 11.86 base pairs (bp) (95% confidence interval, 9.27 to 14.39 bp). In addition, our results showed that: (i) overall intron gains are ~25% greater than intron losses, although specific patterns vary with time and lineage; (ii) parallel gains account for ~18.5% of shared intron positions; and (iii) reacquisition following loss accounts for ~0.5% of all intron positions. Our results should assist in resolving the long-standing problem of inferring the evolution of spliceosomal introns.
Changed:
<
<
[S5] Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model, Lunter et al.
>
>
[S5] Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model, Lunter et al.
  It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.
Changed:
<
<
[S6]Ten Simple Rules for Getting Published (I am not proposing this for presentation but as an interesting article for all of us to read), Philip E. Bourne
>
>
[S6]Ten Simple Rules for Getting Published (I am not proposing this for presentation but as an interesting article for all of us to read), Philip E. Bourne
 

NAR (selected by Dan)

Revision 32006-01-19 - DanTulpan

Line: 1 to 1
 
META TOPICPARENT name="BioinformaticsReadingGroup"

BMC Bioinformatics (selected by Mirela)

Line: 45 to 45
  [S6]Ten Simple Rules for Getting Published (I am not proposing this for presentation but as an interesting article for all of us to read), Philip E. Bourne
Added:
>
>

NAR (selected by Dan)

[D1] Application of a superword array in genome assembly, Xiaoqiu Huang, Shiaw-Pyng Yang, Asif T. Chinwalla, LaDeana W. Hillier, Patrick Minx, Elaine R. Mardis, and Richard K. Wilson.

We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.

[D2] Automatic assessment of alignment quality, Timo Lassmann and Erik L. L. Sonnhammer.

Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.

[D3] Integrating probe-level expression changes across generations of Affymetrix arrays, Laura L. Elo, Leo Lahti, Heli Skottman, Minna Kyläniemi, Riitta Lahesmaa, and Tero Aittokallio.

There is an urgent need for bioinformatic methods that allow integrative analysis of multiple microarray data sets. While previous studies have mainly concentrated on reproducibility of gene expression levels within or between different platforms, we propose a novel meta-analytic method that takes into account the vast amount of available probe-level information to combine the expression changes across different studies. We first show that the comparability of relative expression changes and the consistency of differentially expressed genes between different Affymetrix array generations can be considerably improved by determining the expression changes at the probe-level and by considering the latest information on probe-level sequence matching instead of the probe annotations provided by the manufacturer. With the improved probe-level expression change estimates, data from different generations of Affymetrix arrays can be combined more effectively. This will allow for the full exploitation of existing results when designing and analyzing new experiments.

[D4] Protocols for the assurance of microarray data quality and process control, L. D. Burgoon, J. E. Eckel-Passow, C. Gennings, D. R. Boverhof, J. W. Burt, C. J. Fong, and T. R. Zacharewski.

Microarrays represent a powerful technology that provides the ability to simultaneously measure the expression of thousands of genes. However, it is a multi-step process with numerous potential sources of variation that can compromise data analysis and interpretation if left uncontrolled, necessitating the development of quality control protocols to ensure assay consistency and high-quality data. In response to emerging standards, such as the minimum information about a microarray experiment standard, tools are required to ascertain the quality and reproducibility of results within and across studies. To this end, an intralaboratory quality control protocol for two color, spotted microarrays was developed using cDNA microarrays from in vivo and in vitro dose-response and time-course studies. The protocol combines: (i) diagnostic plots monitoring the degree of feature saturation, global feature and background intensities, and feature misalignments with (ii) plots monitoring the intensity distributions within arrays with (iii) a support vector machine (SVM) model. The protocol is applicable to any laboratory with sufficient datasets to establish historical high- and low-quality data.

Revision 22006-01-18 - rogic

Line: 1 to 1
 
META TOPICPARENT name="BioinformaticsReadingGroup"
Deleted:
<
<
 

BMC Bioinformatics (selected by Mirela)

[M1] An Approach for Clustering Gene Expression Data with Error Information,

Line: 13 to 11
  Conclusions. The additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.
Added:
>
>

Current Opinion in Structural Biology (selected by Sanja)

[S1] RNA structure: the long and short of it (review article), Stephen R Holbrook

The database of RNA structure has grown tremendously since the crystal structure analyses of ribosomal subunits in 2000–2001. During the past year, the trend toward determining the structure of large, complex biological RNAs has accelerated, with the analysis of three intact group I introns, A- and B-type ribonuclease P RNAs, a riboswitch–substrate complex and other structures. The growing database of RNA structures, coupled with efforts directed at the standardization of nomenclature and classification of motifs, has resulted in the identification and characterization of numerous RNA secondary and tertiary structure motifs. Because a large proportion of RNA structure can now be shown to be composed of these recurring structural motifs, a view of RNA as a modular structure built from a combination of these building blocks and tertiary linkers is beginning to emerge. At the same time, however, more detailed analysis of water, metal, ligand and protein binding to RNA is revealing the effect of these moieties on folding and structure formation. The balance between the views of RNA structure either as strictly a construct of preformed building blocks linked in a limited number of ways or as a flexible polymer assuming a global fold influenced by its environment will be the focus of current and future RNA structural biology.

[S2]Structure, folding and mechanisms of rybozymes (review article), David MJ Lilley

The past two years have seen exciting developments in RNA catalysis. A completely new ribozyme (possibly two) has come along and several new structures have been determined, including three different group I intron species. Although the origins of catalysis remain incompletely understood, a significant convergence of views has happened in the past year, together with the discovery of new super-fast ribozymes. There is persuasive evidence of general acid-base chemistry in nucleolytic ribozymes, whereas catalysis of peptidyl transfer in the ribosome seems to result largely from orientation and proximity effects. Lastly, important new folding-enhancing elements have been discovered.

Science (selected by Sanja)

[S3] The Widespread Impact of Mammalian MicroRNAs on mRNA Repression and Evolution, Farh et al.

Thousands of mammalian messenger RNAs are under selective pressure to maintain 7-nucleotide sites matching microRNAs (miRNAs). We found that these conserved targets are often highly expressed at developmental stages before miRNA expression and that their levels tend to fall as the miRNA that targets them begins to accumulate. Nonconserved sites, which outnumber the conserved sites 10 to 1, also mediate repression. As a consequence, genes preferentially expressed at the same time and place as a miRNA have evolved to selectively avoid sites matching the miRNA. This phenomenon of selective avoidance extends to thousands of genes and enables spatial and temporal specificities of miRNAs to be revealed by finding tissues and developmental stages in which messages with corresponding sites are expressed at lower levels.

PLoS Computational Biology (selected by Sanja)

[S4] New Maximum Likelihood Estimators for Eukaryotic Intron Evolution, Nguyen et al.

The evolution of spliceosomal introns remains poorly understood. Although many approaches have been used to infer intron evolution from the patterns of intron position conservation, the results to date have been contradictory. In this paper, we address the problem using a novel maximum likelihood method, which allows estimation of the frequency of intron insertion target sites, together with the rates of intron gain and loss. We analyzed the pattern of 10,044 introns (7,221 intron positions) in the conserved regions of 684 sets of orthologs from seven eukaryotes. We determined that there is an average of one target site per 11.86 base pairs (bp) (95% confidence interval, 9.27 to 14.39 bp). In addition, our results showed that: (i) overall intron gains are ~25% greater than intron losses, although specific patterns vary with time and lineage; (ii) parallel gains account for ~18.5% of shared intron positions; and (iii) reacquisition following loss accounts for ~0.5% of all intron positions. Our results should assist in resolving the long-standing problem of inferring the evolution of spliceosomal introns.

[S5] Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model, Lunter et al.

It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.

[S6]Ten Simple Rules for Getting Published (I am not proposing this for presentation but as an interesting article for all of us to read), Philip E. Bourne

Revision 12006-01-18 - MirelaAndronescu

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="BioinformaticsReadingGroup"

BMC Bioinformatics (selected by Mirela)

[M1] An Approach for Clustering Gene Expression Data with Error Information, Brian Tjaden

Background. Clustering of gene expression patterns is a well-studied technique for elucidating trends across large numbers of transcripts and for identifying likely co-regulated genes. Even the best clustering methods, however, are unlikely to provide meaningful results if too much of the data is unreliable. With the maturation of microarray technology, a wealth of research on statistical analysis of gene expression data has encouraged researchers to consider error and uncertainty in their microarray experiments, so that experiments are being performed increasingly with repeat spots per gene per chip and with repeat experiments. One of the challenges is to incorporate the measurement error information into downstream analyses of gene expression data, such as traditional clustering techniques.

Results. In this study, a clustering approach is presented which incorporates both gene expression values and error information about the expression measurements. Using repeat expression measurements, the error of each gene expression measurement in each experiment condition is estimated, and this measurement error information is incorporated directly into the clustering algorithm. The algorithm, CORE (Clustering Of Repeat Expression data), is presented and its performance is validated using statistical measures. By using error information about gene expression measurements, the clustering approach is less sensitive to noise in the underlying data and it is able to achieve more accurate clusterings. Results are described for both synthetic expression data as well as real gene expression data from Escherichia coli and Saccharomyces cerevisiae.

Conclusions. The additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback