Frameshifting is a subtle technique some micro-organisms have evolved to save space. Imagine I could compress this node by writing every second and third sentence inside the first; reusing the same letters (up to) three times. This doesn't work for any language we know, because you have to have neither gaps nor variable word length. Indeed, all words would have to be exactly 3 letters long - only childrens books with a fondness for cats and their seating arrangements would apply:

	CAT SAT MAT HAT BAT RAT FAT
	ATS ATM ATH ATB ATR ATF AT <--
	TSA TMA THA TBA TRA TFA T  <--

Now the apparent paradox is that this kind of arrangement seems incredibly difficult to evolve. After all, the probability of three different messages encoded in the same random string seems very unlikely. However, if you use the usual simplistic probability arguments (P(sequence) = Alphabet ^ Length) then frameshifting makes no difference.

Each 3 letter DNA word corresponds to one protein letter; 10 letter proteins need 30 letter genes. If you want three 100-letter proteins encoded by only 102 bases (work it out!) and you randomly pick (and replace:) bases from a large, uniform pool, then the probability is 4102 to get a particular set of 3 frameshifted proteins. Ignoring stop codons, of course. It isn't (4102)3 (or 4102.4102.4102) or any other cunning combination.

So why aren't all genomes constructed like this? Well, life doesn't rely on bad probability calculations. This situation is only possible for organisms that enjoy surprises. Change to one base will alter all 3 proteins - three times the mutation rate, effectively. Well, not quite, since you have 1/3 the number of bases to get mutated...but the point is that frameshifting might well be improbable. It isn't impossible for such simple reasons, however.

Log in or register to write something here or to contact authors.