Alan K. Mackworth's Publications

Sorted by DateClassified by Publication TypeSorted by First Author Last NameClassified by Author Last Name

Evaluation of Gene-Finding Programs on Mammalian Sequences

S. Rogic, Alan K. Mackworth, and B. F. F. Quellette. Evaluation of Gene-Finding Programs on Mammalian Sequences. Genome Research, 11:817–832, 2001.

Download

[PDF]551.6kB  

Abstract

We present an independent comparative analysis of seven recently developed gene-finding programs: FGENES, GeneMark.hmm, Genie, Genscan, HMMgene, Morgan, and MZEF. For evaluation purposes we developed a new, thoroughly filtered, and biologically validated dataset of mammalian genomic sequences that does not overlap with the training sets of the programs analyzed. Our analysis shows that the new generation of programs has substantially better results than the programs analyzed in previous studies. The accuracy of the programs was also examined as a function of various sequence and prediction features, suchas G + C content of the sequence, length and type of exons, signal type, and score of the exon prediction. This approach pinpoints the strengths and weaknesses of each individual program as well as those of computational gene-finding in general. The dataset used in this analysis (HMR195) as well as the tables with the complete results are available at http://www.cs.ubc.ca/ rogic/evaluation/.

BibTeX

@Article{GR01,
  author =	 {S. Rogic and Alan K. Mackworth and B. F. F. Quellette},
  title =	 {Evaluation of Gene-Finding Programs on Mammalian Sequences},
  year =	 {2001},
  journal =	 {Genome Research},
  volume =       {11},
  pages =         {817--832},
  abstract =	 { We present an independent comparative analysis of seven recently developed
                   gene-finding programs: FGENES, GeneMark.hmm, Genie, Genscan, HMMgene, Morgan,
                   and MZEF. For evaluation purposes we developed a new, thoroughly filtered, and 
                   biologically validated dataset of mammalian genomic sequences that does not
                   overlap with the training sets of the programs analyzed. Our analysis shows that
                   the new generation of programs has substantially better results than the programs
                   analyzed in previous studies. The accuracy of the programs was also examined as a
                   function of various sequence and prediction features, suchas G + C content of the 
                   sequence, length and type of exons, signal type, and score of the exon prediction. 
                   This approach pinpoints the strengths and weaknesses of each individual program as
                   well as those of computational gene-finding in general. The dataset used in this 
                   analysis (HMR195) as well as the tables with the complete results are available at
                   http://www.cs.ubc.ca/~rogic/evaluation/.},
  bib2html_pubtype ={Refereed Journal},
  bib2html_rescat ={},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 23, 2014 19:08:34