Feature: Genomics - the next generation
28 July 2009. By Mun-Keat Looi
The advent of the Human Genome Project and the development of faster, cheaper DNA sequencing revolutionised medical and biological research - and the technology is still evolving. Mun-Keat Looi looks at the platforms pushing us closer toward the ‘$1000 genome’.
At the turn of the millennium, genomics captured the public's imagination with the publication of the first draft of the human genome and the insights it offered into our biology.
When the Human Genome Project started in 1991, DNA sequencing entailed laborious radiation-based methods, with researchers manually loading electrophoresis gels and painstakingly reading bases from the resulting images. Today's sequencing technologies are easier, faster and - crucially - cheaper.
"It used to take us half a million pounds and three years to generate one bacterial genome sequence. Now you can do 100 bacterial genomes with a single run," says Dr Julian Parkhill, Director of Sequencing at the Wellcome Trust Sanger Institute. "We can do things with the new machines that were inconceivable two, three years ago."
This has changed the way scientists approach their research, empowering them to compare the genomes of many organisms at a time, and to look for the minute differences that provide new understanding of disease and biology.
Do you 454?
The workhorse of the Human Genome Project was the capillary sequencer. This removed the need for radiation and pouring gels, automating the DNA-sequencing method invented by Fred Sanger in the 1970s (see animation below). This vastly improved the process, but even so it could take over a year to read a DNA fragment one gigabase (one billion bases, or a third of the human genome) long at a high cost ($0.10 per 1000 bases).
Step forth 454 Life Sciences, who in 2005 launched the first next-generation DNA sequencer. Their technology anchors DNA fragments to tiny beads (one fragment for each bead), which are then put into wells on a plate (see animation below). Nucleotides are washed over the wells in waves and, as they are incorporated into the new DNA strand, the intensity of the light given off is used as a measure of how many As, Ts, Cs or Gs have been incorporated.
This innovative thinking was a giant leap in DNA sequencing. The latest 454 machines are able to read one gigabase of DNA sequence in a couple of days, at a cost of $0.02 per 1000 bases.
But the platform is not without its weaknesses, particularly its difficulty distinguishing the number of bases in a run of identical bases (such as AAAA).
Everything is illuminated
In 2006, Solexa debuted a new sequencing technology, amplifying single DNA fragments in dense clusters on a hollow slide to provide stronger fluorescence signals (see animation below). The platform made its mark, delivering the first African, Asian and cancer patient genomes. In 2006, biotech firm Illumina snapped up Solexa for $615 million.
At the same time, another competitor, Applied Biosystems, rolled out the SOLiD (Sequencing by Oligonucleotide Ligation and Detection) Sequencing technology. Unlike the other sequencing platforms, which rely on a DNA polymerase adding bases one by one to replicate a new DNA strand, SOLiD sequences by ligation, hybridising a range of probes to the DNA template. This facilitates its real trump card: a 'two-base' read that, the company says, improves accuracy by essentially reading every base twice. This provides a higher confidence that a single base change spotted in a genome is indeed a real variation from the standard human genome and not just a mistake made by the system.
Both Illumina and SOLiD sequence DNA around 20 times cheaper than the 454 technology - $0.001 per 1000 bases - taking just half a day to read one gigabase. They also have the advantage of being able to run more samples simultaneously.
The key difference is the length of DNA fragment that each platform can read. While 454 can read up to 450 bases in a row, Illumina and SOLiD concentrate on faster, more accurate, but much shorter reads of 75 and 50 bases respectively. That can make the final assembly or mapping process harder. If an organism's genome is being sequenced for the first time (known as de novo sequencing), it's like doing a 1000-piece jigsaw puzzle with no reference picture.
"Cost-wise and in terms of the amount of data produced per run, you would go with the short reads if you could get away with them, but you can't for every application," says Dr Daniel Turner, Head of Sequencing Technology Development at the Sanger Institute.
Pathogen researchers, for example, prefer 454: the relatively small size of a bacterial genome (a few million base pairs of DNA compared to the three billion in the human genome) means that one genome can be assembled almost completely with just one run of the long-read platform. The short-read platforms of Illumina and SOLiD lend themselves better to resequencing, when the genome has already been sequenced once but a researcher wants to pick out unique variations in individuals.
It all depends on the needs of the research.
"There's always a shortfall between what you'd like to do and what you can do - what you can afford to do," says Dr Harold Swerdlow, Head of Sequencing Technology at the Sanger Institute.
"People talk about the low-hanging fruit a lot in genomics. Would you rather sequence the whole genomes of 10 patients accurately at every location, or 1000 patients reasonably accurately, with a decent chance that you'll find something medically relevant? That's what they're balancing at present cost levels."
The next next generation
But as 454, Illumina and SOLiD vie for market share, the third generation of sequencers are approaching.
Pacific Biosciences uses a technique anchoring the polymerase to the bottom of a well. The DNA sample is fed through, with fluorescent signals being read as fast as the enzyme can work - and polymerases are naturally able to work extremely fast.
"There are no terminators, so you are not trying to artificially stop the polymerase to take a picture, as the current technologies do," says Dr Parkhill. "You can read bases per second rather than a base every 20 minutes, with potential reads of kilobases or longer."
The technology also needs just a single molecule of DNA, doing away with the current need to amplify a DNA sample using the polymerase chain reaction (PCR), which can introduce errors.
Oxford Nanopore Technologies goes further. Rather than sequencing DNA by adding bases, the company cuts them off one by one using an enzyme called an exonuclease. The free bases then pass through a tiny pore, with different bases producing different electrical signals - though it remains to be seen if this is as reliable as measuring fluorescence.
Meanwhile, the current generation continue to optimise their methods. Illumina and Applied Biosystems say they will be able to sequence human genomes for just $10 000 by the end of 2009. And a new player, Complete Genomics , claims it will soon be offering whole human genome sequencing as a service for just $5000.
All are vying to be the first to offer the '$1000 genome', at which point sequencing will start to become viable for routine clinical use.
Says Dr Parkhill, "If you look ten years down the line, then we will have real-time DNA sequencing. You'll go to the doctor's with an infection and they'll take a blood or sputum sample and sequence it, on the spot, to find out what's in it."
Image: DNA double helix and sequencing output. Credit: Peter Artymiuk, Wellcome Images