Molecular Biology of HIV (idea) by bitter_engineer

You can see HIV-1's genetic code for yourself, courtesy of the National Institute of Health at
http://www.ncbi.nlm.nih.gov/projects/genotyping/view.cgi?db=1 .

One thing that has always impressed me about HIV is how SMALL it is. The DNA strand that codes for it takes up only 8379 base pairs, or base pair triplets to code for 2793 amino acid blocks. Since a base pair triplet can code any of 21 amino acid types (counting the 'end' instruction) each one is equal to (ln(21)/ln(2)) = 4.4 bits of information. This means that the code for the HIV only takes up (2793 *4.4/8)= 1537 bytes.

It's kind of unnerving to know that you can write Hello World programs in certain programming languages that take up more data than a virus that can hijack a human immune system.

update
CrazyIvan pointed out to me the fact that any DNA sequence has 6 possible frames for reading: Each strand can be read from one of three possible offsets, in either direction. This means that we cannot make the 6-bit to 21-amino acid compression, and must treat each base pair as an uncompressible 2 bits of information. Now, our code size is 8329 base pairs * 1 byte/4 base pairs = 2,083 bytes. (Although the implementation becomes 6 times more complex, most of this complexity is in the processor mechanism, and not in the encoding)

Synthetic blood substitute	14 Rules of Internet Chat	Human immunodeficiency virus	reverse transcriptase
AIDS	gp120	Is it possible to learn things on everything?	Protease Inhibitor
retrovirus	Buckyball	pol	herpes
FIV	protease	integrase	Human RNA viruses
HIV	bacteriophage	gp160	bejeezus
picornavirus	Gag	spaghetti code