The act of adding redundancy to a message so that errors in transmission can be detected, and sometimes corrected, from the received message alone. The opposite of error-correcting coding is compression.

In data communications and storage technology, Error Correcting Codes (ECC) are implemented to add the possibility to reconstruct the original data, should it be damaged during a transfer (like a noisy modem line), or directly on the storage medium (such as a scratch on a CD). All of the numerous algorithms available rely on some redundancy in the transferred data.

A simple, effective, but quite expensive (with regards to bandwidth and storage) method is sending each bit three times. An example:

  01011 => 000 111 000 111 111 => noise => 001 111 010 110 011
Each "triplet" will then be decoded to the most "dominant" bit:
  001 => 0
  111 => 1
  010 => 0
  110 => 1
  011 => 1

Allright, let's try a more efficient example. A simple, yet often sufficient algorithm is the VRC+LRC (Vertical + Longitudinal Redundancy Check). It is basically two separate checksums, one "horizontal" and one "vertical", which can be used to pinpoint a single bit error in a block. Consider a data block of six 7-bit words:

  1 0 1 1 0 1 1
  1 1 0 1 0 1 1
  0 0 1 1 1 0 1
  1 1 1 1 0 0 0
  1 0 0 0 1 0 1
  0 1 0 1 1 1 1
We add the parity bits (1 if the number of 1s in a string is odd, and 0 otherwise.) to each word, and a parity "row" at the bottom:
  1 0 1 1 0 1 1 1
  1 1 0 1 0 1 1 1
  0 0 1 1 1 0 1 0
  1 1 1 1 0 0 0 0
  1 0 0 0 1 0 1 1
  0 1 0 1 1 1 1 1
  0 1 1 1 1 1 1 0
Now, should one of the bits be inverted by line noise, we will end up with one parity error in the vertical column, and one in the row at the bottom. And that's all we need to correct it! This simple scheme will only allow one error per row/column pair, though, but in many cases it will suffice.

ECCs are getting quite ubiquitous; most parallel ports in modern computers can do ECC, RAM chips which employ ECC are available, and it has been used on audio CDs for years.

DNA also has 'error correcting' codes1. The genetic code2 is made up of triplets, each of which codes for a particular amino acid - essentially three 'letters' in the DNA alphabet for every 'letter' in the protein alphabet.

The redundancy lies in the fact that there are only 20 letters for proteins while DNA has four - and triplets of four make 4 X 4 X 4 = 64 possible codes. This means that several particular triplets (called codons) can code for a particular amino acid. Also, a few codons represent the control codes start and stop. As an analogy to communication, DNA is the 'message' that is sent from one generation to the next. This message must include redundancy to supress errors (mutation) - even though mutation can occasionally be useful.

However, there are further levels of redundancy at the level of the chemical language of proteins. Often, one letter can substitute for another, as the set of amino acids can be divided into subsets with similar chemical properties. In addition, DNA is redundant by virtue of its structure - the double helix provides a 'reversed' copy of the information in the antisense strand.

It is this redundancy that not only mitigates some of the effects of mutation, but also provides some of the raw material for diversity in a population. Although two organisms may be producing the same protein, and therefore have the same phenotype the underlying gene may not be the same. It is this 'silent' genetic diversity (neutral drift) that can potentially allow a population to respond to a diverse set of environmental changes.


1 Of course, most genomes are truly error-correcting in that they code for proteins that repair damage and mutations! :)
2 Biology has probably borrowed some of the terminology of codes from electrical engineering and/or information theory, but the meanings may be slightly different.

Log in or register to write something here or to contact authors.