Protein Identification

Subject:	Rapid Protein Identification by Mass Spectrometry
Presenter:	Mark Cieliebak

Abstract	Rapid Protein Identification by Mass Spectrometry When a protein is isolated in an experiment, one would like to know whether it is already known, i.e., one would like to check whether the protein is stored in the databases of known proteins. Otherwise, if the protein is new, then we will start to investigate it from scratch. A commonly employed approach for database lookup compares the masses of fragments of the protein (its mass fingerprint) against the proteins stored in a database. We will present and discuss efficient algorithms, that allow to compare one mass against a given protein sequence in time sublinear in the sequence length. If the protein does not occur in the databases, we want to establish its amino acid sequence. This process is called {\em de novo protein sequencing}. There are algorithms available that can determine the correct amino acid sequence efficiently, given a noise--free tandem mass spectrum (MS/MS spectrum) of the protein. However, real--life MS/MS spectra are always prone to error, which makes the sequencing process difficult. In fact, we conjecture that a single MS/MS spectrum does not contain sufficient information to allow for reliable de novo sequencing. For this reason, we propose to develop techniques for parallel protein sequencing, where many MS/MS spectra that belong to the same protein are sequenced at once.