The science for understanding the complex spontaneous assembly of proteins. Proteins are long chains of amino acids. These chains are very flexible and can take on a large number of different shapes (just as a piece of string can be coiled and knotted in many different ways). With proteins, however, only one specific structure is right. This is called the fold, or native state. How a protein arrives at this native state without spending an eternity in the nearly infinite number of wrong folds is subject of intense study (this problem is sometimes referred to as Levinthal's Paradox. This field is called protein folding.

One way of studying protein folding is by studying how a protein unfolds! Proteins are subjected to chemical stress or high heat, causing them to fall apart. This reverse process is studied with various techniques to understand how a protein starts from a disrupted state and finds its home.

This is eventually important to things like the Human Genome Project where it would be advantageous to predict a protein's structure based on its sequence. This is a potentially huge frontier in structural biology. Various computational and bioinformatic database techniques are being applied to solve this problem in silico.

See also:

Molten Globule


Visualise a protein as beads on a string (Synteny :) that are alternately sticky or charged. This is the HP model - hydrophobic and polar amino acids only.

This string starts as a fluctuating coil like a sea snake but the repulsion of some charges and the attraction between others combine with the sticky patches to compact the string. Several schools of thought exist on how  this occurs, including hydrophobic collapse and the funnel hypothesis.

Most importantly, there is a difference between small and large proteins. Large ones may fold in several stages, possibly even by domain association. Smaller proteins often go from unfolded to folded (or native state) in one step - and very quickly too.

The time scale of folding is on the order of milliseconds, wheras the famous Levinthal paradox concluded that a random search would take many aeons. Therefore, some sort of ordered pathway must exist - such as hierarchical folding.


Protein folding refers to the process by which a protein assembles itself into its correct native structure by arranging its chains of amino acids into structural motifs and orientating them relative to one another. Although some of you may be thinking along the lines of a kind of glorified origami this field of biochemical research is more complicated than its name might suggest (although it is a valid metaphor, just bear in mind that instead of hands this kind of folding occurs via molecular interractions involving various kinetic factors).

Proteins are synthesised as linear chains of amino acids which are only of biological use once they have folded into a stable specific 3-dimensional conformation. For example, a digestive enzyme such as pepsin would be of little use as a loose chain of amino acids flapping around as they please in the stomach but must instead fold to the correct shape in order to bind to its substrates and break them down. This folding must be precise in order for the protein to be functional.

Protein folding is a spontanteous, ordered and reversible process.

Protein folding has long known to be spontaneous in vitro (Anfinsen 1973) with no other factors required for correct folding in solution. Therefore ultimately all the information required for correct folding exists within the protein itself, in its primary structure (the linear sequence of amino acids that make up the protein, encoded for by DNA). Specifically folding refers to the formation of secondary and tertiary structure. Secondary structure refers to structural motifs such as alpha helices and beta sheets, which can form repeated structural patterns within a protein. Tertiary structure relates to the arrangement of these motifs and overall arrangement of the protein in space, its overall shape often referred to as its fold. The observation that primary structure determines a protein's fold and contains all the information necessarry for the protein to fold correctly has wide ranging implications as it suggests that at some point in the future we may be able to predict a protein's fold from its amino acid sequence, and because we can determine amino acid sequence from gene sequences potentially we may be capable of determining a proteins structure from a gene sequence (see below).

Is folding fast or slow?

Well actually its both. The Levinthal Paradox gives us one perspective on this question. This calculation assumes that in a protein of n amino acids each amino acid has two rotatable bonds (admittedly a gross underestimate considering the long side chains of some amino acid residues) and that each bond has three stable conformations (another underestimate). Therefore there are 32n possible conformations in a protein. Now assume that bonds in a protein can reorientate at the same rate as a single bond to find around about 1013 conformations a second (almost certainly an overestimate). The time taken for a protein to search all the conformations available to it can thus be expressed as t = 32n/1013.

Try this for a small protein of 100 amino acids. Can't be bothered? Well lucky I'm here - the answer is around about 2.7 x 1082 seconds. To put things in perspective that's longer than the universe is thought to be old. Thus even while applying assumptions grossly innaccurate in favour of quicker than actual folding, a small 100 residue protein would need to be older than the universe to guarantee finding its native conformational state by random folding. In the more complicated reality of protein dynamics folding in this manner by exhaustive random re-orientation of bonds would take even longer. Thus it has been concluded that folding is not a random process but a folding pathway must exist to guide folding.

Protein folding actually takes seconds. However, although the early protein-folding scientists questioned how protein folding could be so fast later studies showing that folding of isolated secondary structure such as alpha helices could take place in nanoseconds questioned why folding of whole proteins was so slow in comparison. The reason for this is that an energy barrier exists in the folding pathway as proteins must acheive an energitically unfavourable transition state on their way to their final conformation. The transition state only lasts for around about a picosecond but the difference in stability between the transition state and unfolded states (i.e. the difference in free energy) represents a barrier that must be overcome in order for folding to proceed.

Protein folding pathways

Since the 1980s there have been two main contrasting proposals for the protein folding pathway:

The framework model - secondary structures are proposed to form first and dock with each other to influence the folding of other parts of the proteins into the native state.

The hydrophobic collapse model - the protein collapses into a compacted state due to the hydrophobic effect and thus limits the conformational search for the native state.

However, experimental evidence does not suggest the presence of either tightly compacted structures without secondary structure or expanded molecules with highly ordered secondary structure. In addition a tightly compacted collapsed state would limit the reorganisation of structure and thus not favour folding while evidence suggests that secondary structure present in unfolded states still requires hydrophobic interractions for stability.

Nucleation condensation - a unifying mechanism for protein folding.

As with many directly opposing scientific theories the answer appears to lie somewhere inbetween the two extremes. The nucleation condensation theory arose in the 1990s and has been developed recently as proteins such as chymostrypsin inhibitor 2 were found to fold without folding intermediates (described as having two-state kinetics) while phi-value analysis of the transition state (an unstable conformation that lasts for a picosecond) showed that secondary and tertiary structure forms in parallel as the protein undergoes a general collapse. Molecular dynamic simulations of unfolding have provided furthur atomic resolution to support this experimental work which is also in agreement with general kinetic models. In brief this model suggests that patches of residual structure in the unfolded state such as hydrophobic clusters and short alpha helices interract with each other through long range contacts (in terms of sequence) to form a nucleus of native-like structure in the transition state. Nucleus formation is the rate-determining step of folding representing the energy barrier discussed previously. Secondary structure and tertiary structures formed in the nucleus aids the formation of furthur native structure through furthur long range contacts, via side-chain interractions for example, referred to as contact-assisted structure formation. Multi-domain proteins can use this mechanism to fold each domain separately in a localised manner. Although the sequence of nuclei are not conserved from protein to protein their secondary structural motifs do appear to be conserved. Because secondary structure can be predicted from amino acid sequence by methods of varying accuraccy there is potential for identification of the structure of folding nuclei from which overall structure prediction may eventually be possible.

A Helping Hand

Chaperones such as the Hsp70 family, are protein molecules required for efficient folding in vivo. They have no influence on how a protein folds but are important in allowing folding to occur in the intracellular environment. They bind to highly hydrophobic sequences and thus recognise unfolded proteins which would otherwise have such sequences buried in the core of the protein. Because proteins are synthesised as a chain of amino acids emerging from a ribosome it is important that a chaperone binds to the emerging chain to prevent premature folding before all the "information" (i.e. the complete amino acid sequence) is present. In addition the cell is a highly crowded environment (~300g/l protein and other macromolecules) which increases the possibility of unfolded protein structures associating through hydrophobic interractions and forming aggregates. It is the role of chaperones to sequester proteins and prevent this type of "clumping". Chaperones bind and stabilise proteins in unstable states and through regulated binding and release facilitate their correct folding.

So why is this all important?

Well you've read this far (or skipped right to the end), I guess you might want to know what implications this has in the "real world". As mentioned previously, proteins need to fold properly to be functional. So unsurprisingly diseases exist due to misfolding of proteins. As proteins aggregate due to misfolding these aggrergates adsorb other important macromolecules which damages and kills cells. Protein aggregates released from dead cells can in extreme cases damage tissues such as the brain, which is particularly vulnerable due to its highly organised network of nerve cells necessarry for function. Thus misfolding diseases such as Alzheimer's disease, Huntingdon's disease and prion diseases (such as Creutzfeldt-Jacob disease) manifest themselves in neurodegeneration and dementia. Possible treatment of these diseases could exploit detailed knowledge of protein folding and the prevention of abnormal folding.

The potential of prediction of protein structure from DNA/amino acid sequence as mentioned before will rely on a detailed and complex understanding of how proteins fold. Prediction of protein fold and thus protein structure would mean that in conjunction with the Human Genome Project (or the sequence of any other organism's genome for that matter) the structure of proteins encoded for by genes of unknown function could have a structure determined without actually having to isolate the protein itself. Such determined structures are likely to suggest the function (or at least a class of function) of the protein encoded and perhaps lead to the identification of the roles and modes of action of genes for which there is currently little information.

Despite its simplistic name, protein folding is a very complex and wide-reaching field of study providing many exciting prospects for the future of biology, biochemistry and molecular biology.


Daggett V and Fersht A R, Is there a unifying mechanism for protein folding?, Trends in Biochemical Sciences, Vol.28 No.1 January 2003: pp 18-25,

Agashe V R and Hartl F-U, Roles of molecular chaperones in cytoplasmic protein folding, Seminars in Cell & Developmental Biology, Vol 11, 2000: pp. 15-25

Anfinsen CB, Principles that govern the folding of protein chains, Science 181:223-230 1973

Alberts et al., Molecular Biology of the Cell (4th edition), Garland Sciences, USA, (2002)

Voet D and Voet J G, Biochemistry (2nd edition), John Wiley & Sons, Inc., 1995.

Lectures from Dr S E Radford at the University of Leeds

Log in or register to write something here or to contact authors.