Abstract |
Rapid Protein Identification by Mass Spectrometry
When a protein is isolated in an experiment, one would like to know
whether it is already known, i.e., one would like to check whether the
protein is stored in the databases of known proteins. Otherwise, if
the protein is new, then we will start to investigate it from scratch.
A commonly employed approach for database lookup compares the masses
of fragments of the protein (its mass fingerprint) against the
proteins stored in a database. We will present and discuss efficient
algorithms, that allow to compare one mass against a given protein
sequence in time sublinear in the sequence length.
If the protein does not occur in the databases, we want to establish
its amino acid sequence. This process is called {\em de novo protein
sequencing}. There are algorithms available that can determine the
correct amino acid sequence efficiently, given a noise--free tandem
mass spectrum (MS/MS spectrum) of the protein. However, real--life
MS/MS spectra are always prone to error, which makes the sequencing
process difficult. In fact, we conjecture that a single MS/MS spectrum
does not contain sufficient information to allow for reliable de novo
sequencing. For this reason, we propose to develop techniques for
parallel protein sequencing, where many MS/MS spectra that belong to
the same protein are sequenced at once.
|