Shotgun sequencing is a method of whole-genome sequencing
that is particularly suited to high-throughput, assembly-line style methodology,
allowing the entire genomes of organisms to be sequenced relatively rapidly. It is much quicker than other techniques such as chromsome walking (which was used by the International Human Genome Consortium). This method made the news when Celera Genomics used shotgun sequencing to complete its
draft sequence of the three billion bases of the human genome in only three
years, although there were claims that Celera had used publicly available genome data to construct its pay-to-view, annotated sequence.
How Shotgun Sequencing Works
In a shotgun sequencing experiment, the first step is to create several genomic libraries. A genomic library is a set of bacterial colonies (or YACs, or cosmids), each of which contains a small piece of the human genome. Together, the colonies contain the entirety of the genome. The reason this must be done is that
automated Sanger sequencing, the preferred method, is limited to sequence
reads of several hundred base pairs at the most, after which accuracy drops
off dramatically. It would be nice if we could start at the beginning of the genome and read until the end, like a book, but for now we must rely on these techniques.
To create these libraries,
restriction enzymes are used to cut the genomic DNA of the desired organism into
thousands of pieces of smaller DNA, which are then cloned into bacterial plasmids and transformed into bacteria (typically e. coli). The experimenter now possesses thousands of distinct bacterial colonies, each with a small piece of the human genome. Several such libraries are made with different restriction enzymes to achieve overlapping coverage of the genome.
Now, each of the genome pieces of the library is individually sequenced and catalogued, usually using robotic equipment. After
all the sequencing is finished, the result is a database containing thousands
of sequences. High-powered computers match up the sequences by looking for
overlaps in the sequences, eventually building thousands of basepairs of
contiguous sequence called a contig. These contigs are then matched
together using various methods to provide a draft sequence.
This approach of sequencing the genome in thousands of
sections is evocative of the hundreds of pellets in a shotgun shell, giving
rise to the name.
Problems with Shotgun Sequencing
A serious problem with shotgun sequencing is the prevalence of repeat DNA. A
significant proportion of eukaryotic genomes is composed of repeated ‘junk’
DNA. These repeats range from tens to thousands of base pairs in length, and
the number of repeats may also vary widely. Because a stretch of repeat DNA
is composed of identical subunits, it is impossible to determine from the
overlap of sequence reads how many repeats there are. In addition, many repeats
are widely distributed throughout the genome, further complicating attempts
to nail down their location and extent. These problems can be worked around by
more traditional genetic methods, such as recombinant analysis.