(Molecular biology) Most genes in the genome code for proteins (the molecules that do most of the real work in the cell, such as breaking down fuel and holding the cell together). To make a protein, the cell first transcribes the appropriate gene into mRNA, which then carries the gene's information to the ribosome (a molecular complex containing both RNA and protein) where it is translated into functional protein.

The part of the mRNA molecule that contains this information is known as its coding region, and it is composed of a series of clusters of three nucleotides known as codons. Each of the 64 different codons designates a specific amino acid (see the Universal Genetic Code) except for three: TAA, TAG and TGA, which are known as stop codons. These three codons signal the ribosome to stop reading, ending the production of the protein.

This all seems straightforward enough (at least, it does for molecular biologists). But when you think about it, this means that a given mRNA molecule can be read in three different ways, to create three different protein sequences. To illustrate:

             ..T CTC AGC GTT ACC ATG A..
             ... Leu Ser Val Thr Met ...

The first line is the mRNA sequence (well, technically it's a DNA sequence, but the end result is the same), while the second line is the corresponding amino acid sequence (see Universal Genetic Code for the codes, and CrazyIvan's writeup at this node for the meaning of the three-letter abbreviations). This is all well and good, but what happens if you start the decoding process one nucleotide later?

              .TC TCA GCG TTA CCA TGA 
              ... Ser Ala Leu Pro STOP!

Instead of coding for the functional protein sequence Leu-Ser-Val-Thr-Met-etc., the mRNA now codes for a completely different set of amino acids, including a TGA stop codon. Thus the information in the mRNA molecule is only useful if it is read in the correct way; otherwise the end result is protein gibberish, usually coming to an abrupt halt after a few dozen codons (since 3 of the 64 codons are stop codons, almost one in twenty random codons will halt translation).

Which brings me, finally, to the actual point of this writeup. Each of the three different ways of reading mRNA are known as reading frames. Any reading frame which does not contain a stop codon is known as an open reading frame. In practice, the longest open reading frame in an mRNA is generally the one that encodes the protein. For this reason, searching for open reading frames (or ORFs, as they are commonly known) is a simple way of finding genes in a newly sequenced piece of an organism's genome. Basically, if you find a long ORF, you've probably found a gene.


If you enjoyed this writeup, you might also like cytoskeleton or expression, or maybe even codon. This has been an attempt at a factual writeup - if you spot errors in this writeup, or any of my writeups, please /msg me and let me know.

Log in or register to write something here or to contact authors.