Evaluating and Improving the Accuracy of Computational Gene-Finding on Mammalian DNA Sequences
(Sanja Rogic's M.Sc. work, supervised by Alan Mackworth and Francis Ouellette)
My Master's thesis has two distinguishable parts: the first presents an evaluation and comprehensive analysis of the current generation of gene-finding programs. For this purpose a new, thoroughly filtered and biologically validated test dataset of genomic sequences was assembled. The basic prediction accuracy of the programs tested was calculated and the relationships between various sequence and prediction features and programs' accuracy were analyzed.
The second part of the thesis presents the development and results of methods for combination of the predictions from two gene-finding programs. Three methods were developed, each having some advantages over the other two, and each of them offering higher prediction accuracy on the test dataset than any gene-finding program currently available.