The topic of my masters thesis project at George Mason University
is the study of a genetic algorithm approach to predicting protein
structure in abstract proteins folded in 3-dimensional lattices.
This is an important topic in the field of proteomics and one of
the "grand challenges" of molecular biology. If you are
unfamiliar with the protein folding problem, check out my introduction
to protein folding.
Hydrophobic Packing Problems
Because hydrophobicity is thought to influence something like 70%
of a protein's folding behavior, I am discarding the electrodynamics
and mechanical constraints in real world proteins, and abstracting
the proteins as simple chains of hydrophobic or polar amino acids.
Instead of folding these proteins in normal three-space, we will
fold our abstract proteins onto an artificial three dimensional
These are the essentials of the "hydrophobic
packing model" pioneered by K.A. Dill. Hydrophobic packing
models discard the ancillary details of protein folding without
reducing the computational complexity of the problem. Thus, we can
devise problem solving systems which can scale-up
to solving real-world protein folding problems.
Evolutionary Computation Approach
Because the protein folding problem is considered to be an NP-complete
problem, most investigators have given up on trying to find polynomial
time solutions, and turned to massively parallel computers to derive
solutions by brute force. These approaches are unacceptable for
- Interesting proteins are typically large: several thousand amino
acids in length. An exhaustive computer search for the optimal
solution would require billions of years even harnessing all the
computer power on the planet.
- Such approaches contribute nothing to the understanding of
the pathways by which proteins fold. Big powerful computers are
impressive engineering feats, but not scientifically useful because
they can't tell you "why" proteins fold the way they
Other research has turned to non-admissible methods which use clever
heuristics for deriving the structures of proteins. Genetic algorithms,
because they have been used to successfully solve other problems
with NP-complete characteristics, are also a popular approach and
can be further enhanced using heuristics. My research utilizes a
novel genetic algorithm to folding proteins.
No Free Lunch
Unlike exhaustive search techniques, genetic algorithms utilize
non-admissible heuristics and thus cannot gaurentee polynomial
time solutions... and even if a GA arrived at the optimal solution,
it wouldn't know that the solution is optimal because it hasn't
examined all of the conformations possible. Gaurenteeing a solution
is tantemount to saying that you know the optimal solution's fitness
value ahead of time. But since fitness relies on structure, you
would have to know the structure ahead of time as well. This is
the so-called "termination problem" in GAs, and is also
an unsolved problem.