Difference: AbstractsJan2006 (4 vs. 5)

Revision 52006-01-23 - baharak

Line: 1 to 1
 
META TOPICPARENT name="BioinformaticsReadingGroup"
Deleted:
<
<
 VoteAbstractsJan2006
Line: 99 to 98
  Microarrays represent a powerful technology that provides the ability to simultaneously measure the expression of thousands of genes. However, it is a multi-step process with numerous potential sources of variation that can compromise data analysis and interpretation if left uncontrolled, necessitating the development of quality control protocols to ensure assay consistency and high-quality data. In response to emerging standards, such as the minimum information about a microarray experiment standard, tools are required to ascertain the quality and reproducibility of results within and across studies. To this end, an intralaboratory quality control protocol for two color, spotted microarrays was developed using cDNA microarrays from in vivo and in vitro dose-response and time-course studies. The protocol combines: (i) diagnostic plots monitoring the degree of feature saturation, global feature and background intensities, and feature misalignments with (ii) plots monitoring the intensity distributions within arrays with (iii) a support vector machine (SVM) model. The protocol is applicable to any laboratory with sufficient datasets to establish historical high- and low-quality data.
Added:
>
>

Bioinformatics (selected by Baharak)

[B1] Sequence-based heuristics for faster annotation of non-coding RNA families, Zasha Weinberg ,and Walter L. Ruzzo.

Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be.

In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that—unlike family-specific solutions—can scale to hundreds of ncRNA families.

[B2] QUASAR—scoring and ranking of sequence–structure alignments , Fabian Birzele , Jan E. Gewehr and Ralf Zimmer .

Sequence–structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence–structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence–structure alignments ranking) provides a unifying framework for scoring sequence–structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against ‘standard-of-truth’ structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

[B3] Discovering hidden viral piracy , Eddo Kim, and Yossef Kliger.

Viruses and developers of anti-inflammatory therapies share a common interest in proteins that manipulate the immune response. Large double-stranded DNA viruses acquire host proteins to evade host defense mechanisms. Hence, viral pirated proteins may have a therapeutic potential. Although dozens of viral piracy events have already been identified, we hypothesized that sequence divergence impedes the discovery of many others.

We developed a method to assess the number of viral/human homologs and discovered that at least 917 highly diverged homologs are hidden in low-similarity alignment hits that are usually ignored. However, these low-similarity homologs are masked by many false alignment hits. We therefore applied a filtering method to increase the proportion of viral/human homologous proteins. The homologous proteins we found may facilitate functional annotation of viral and human proteins. Furthermore, some of these proteins play a key role in immune modulation and are therefore therapeutic protein candidates.

[B4] Analysis of differentially-regulated genes within a regulatory network by GPS genome navigation , Igor Zwir , Henry Huang and Eduardo A. Groisman.

A critical challenge of the post-genomic era is to understand how genes are differentially regulated even when they belong to a given network. Because the fundamental mechanism controlling gene expression operates at the level of transcription initiation, computational techniques have been developed that identify cis regulatory features and map such features into expression patterns to classify genes into distinct networks. However, these methods are not focused on distinguishing between differentially regulated genes within a given network. Here we describe an unsupervised machine learning method, termed GPS for gene promoter scan, that discriminates among co-regulated promoters by simultaneously considering both cis-acting regulatory features and gene expression. GPS is particularly useful for knowledge discovery in environments with reduced datasets and high levels of uncertainty.

Application of this method to the enteric bacteria Escherichia coli and Salmonella enterica uncovered novel members, as well as regulatory interactions in the regulon controlled by the PhoP protein that were not discovered using previous approaches. The predictions made by GPS were experimentally validated to establish that the PhoP protein uses multiple mechanisms to control gene transcription, and is a central element in a highly connected network.

[B5] Using information theory to search for co-evolving residues in proteins , L. C. Martin , G. B. Gloor , S. D. Dunn and L. M. Wahl.

Some functionally important protein residues are easily detected since they correspond to conserved columns in a multiple sequence alignment (MSA). However important residues may also mutate, with compensatory mutations occurring elsewhere in the protein, which serve to preserve or restore functionality. It is difficult to distinguish these co-evolving sites from other non-conserved sites.

Results: We used Mutual Information (MI) to identify co-evolving positions. Using in silico evolved MSAs, we examined the effects of the number of sequences, the size of amino acid alphabet and the mutation rate on two sources of background MI: finite sample size effects and phylogenetic influence. We then assessed the performance of various normalizations of MI in enhancing detection of co-evolving positions and found that normalization by the pair entropy was optimal. Real protein alignments were analyzed and co-evolving isolated pairs were often found to be in contact with each other.

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback