Module 4: Protein Structure (2)
Given energy function (force field) , we want to use ¡°genetic algorithm¡±
to find global minimal.
To minimize over space
Look at population of such
Use genetic operators: mutation, cross-over, to generate offspring
Use f to evaluate fitness of all individuals
Select individuals that constitute next generation (survivors)
Mutation rate , how to mutate, cross-over
Selection (usually population size = number of individuals is
kept constant, can use deterministic or probabilistic choice mechanisms)
Historically, GA is defined to operate on binary data.
Thus, are bit vectors
and need to have good encoding scheme.
Evolutionary algorithm: like GA, but works directly on non-binary data.
Outline application of EA to tertiary structure prediction.
Initialization: starting population is randomly chosen (could also be
based on statistics from protein database, e.g. PDB)
Evaluate initial population
Generate new individuals
Mutate: replace a torsion angle to a randomly selected value (same as above)
Variation: increment/decrement torsion angle by
Two-point crossover (helps to keeps changes in structure reasonably local)
Uniform crossover (50%)
Results (how good is this?)
Applying EA (1000 generations, 10 individuals) gives very bad results
(structures found are quite different from native Crambin).
structure is difficult to predict
high resolution structure known(1.5)
The energy model is not good enough. It turns out that the energy of the result
structure is much lower than the energy of the native structure, according
to the energy model.
Note: The approach is much more successful for side-chain packing
Note: The EA can generally can be improved by using more sophisticated (problem specific) search operators (here: "local twist").
Protein Secondary Structure Prediction
Based on probability of accounting certain AAs as in given secondary
structure(estimate from PDB)
This method gives only about 50% prediction accuracy
e.g. Predict as a -helix :segment of 6
Chou-Fasman method ( a better one)
Don¡¯t look at single AAs, but look at context(a window of AAs).
Normalize frequency counts by the frequency of the AA in a family or database of
Based on normalized frequency counts use rules that predict structure elements
based on local contexts.
E[Pa ]>E[Pb ]
Not includes Proline.
Accuracy ? 63%
Neural Network approaches
Neural networks are typically organized in layers. Layers are made
up of a number of interconnected 'nodes', which contain an 'activation
function'. Patterns are presented to the network via the 'input layer',
which communicates to one or more 'hidden layers' where the actual processing
is done via a system of weighted 'connections'. The hidden layers then
link to an 'output layer' where the answer is output as shown in the graph
Neural network is a nature inspired method
Parameters learned from data which are correct 2nd structure.