SQPATH-A Combined SQL-XPATH Query System for RNAML Data

ID
TR-2004-13
Authors
Chita C, Patel R and Yang J
Publishing date
August 16, 2004
Length
23 pages
Abstract
RNA secondary structure prediction has become a major bioinformatics research area, since it could be inferred that all functions of a single-stranded RNA are influenced by its secondary structure [29]. Progress in this field has been hindered, among other things, by the lack of a unified repository for RNA informatics data exchange, and by the lack of a standardized file format. We propose to advance the cause for such a centralized RNA database, and to look at what would be the fastest query approach, should one exist: to store the indexes in a relational table, and use SQL to narrow the set of potential answers to only the matching files, prior to performing XPATH on the RNAML file itself, or to store the indexes at the highest (i.e. first) level of an XML file, and use XPATH exclusively. We have found that storing the indexes in a relational table and using both SQL and XPATH is faster by at least one order of magnitude than storing the indexes at the 1st level of an XML file and using XPATH only. Furthermore, the discrepancy between the speeds of the two query methods increases with the number of files. We describe system we have build to test our hypothesis, our testing procedure and results, and explore avenues that will allow us to generalize our results to other XML databases.