|Title:||How to let nature to tell the nature -- Genome Analysis based on Learning Algorithm and Coding|
Department of Mathematics
Faculty of Education and Integrated Arts and sciences
Waseda University, Tokyo, Japan
In its nature, research in bioinfomatics inevitally gets involved
in discovery science and it requires novel methods for uncovering
new bioknowledge embeded in natural biodata. In discovery science,
inductive learning (or inductive inference) is one of the most
important principles to implement the finding schema.
In this talk, we propose a linear time algorithm that, given a regular set called a local language, identifies a DFA accepting it from only positive data. Then, taking a specific bio-domain of amino acid sequences (called alpha-chain regions in hemoglobin) as an example, we present some experimental results which show an overall success rate of the obtained DFA with 95 % correct identification for unknown positive data and 96 % correct identification for negative data.
Through the talk, we emphasize the importance of the combinatorial approach to modeling concepts which, in principle, consists of a primitive formal system (such as a simple DFA) and its interpritation (such as coding).