The 1995 paper in Science, entitled “Whole-genome random sequencing and assembly of Haemophilus influenzae Rd” was, as Francis Collins had indicated, truly a milestone. It represented the first complete genome of a freeliving organism. With H. influenzae, scientists could for the first time investigate the complete range of digital code—and hence analogue machinery in the form of the corresponding proteins—that was required for life. This was important not only for the promise of future, detailed knowledge about how cells function, but also for demonstrating that it was possible—at least on a bacterium—to obtain the complete digital code that ran an organism. Until TIGR’s paper, the possibility remained that there would be some final, unsuspected obstacle to elucidating the detailed chemical text of the program. In a sense, Venter’s work also validated the entire concept of genomics (the study of entire genomes) beyond that of traditional genetics (the study of individual genes). It implicitly marked the start of a new phase in molecular biology, one that was based on complete digital knowledge of an organism. The long-term effects of this shift will be so profound that future generations will probably struggle to imagine how it was possible to conduct biological sciences and medicine without genomes.
The paper from Venter and his team provided some details of how the work was carried out. First, it noted the continuity with his earlier EST work: “The computational methods developed to create assemblies from hundreds of thousands of 300- to 500-bp [base pairs] complementary DNA (cDNA) sequeled us to test the hypothesis that segments of DNA several megabases [millions of bases] in size could be sequenced rapidly, accurately and cost-effectively by applying a shotgun sequencing strategy to whole genomes.” As with the EST work, the key to Venter’s success was the use of plenty of powerful technology. The paper states that it took 14 ABI sequencing machines, run by eight technicians for three months, to produce 23,304 sequence fragments. These were put together using the TIGR Assembler program, running on a SPARCenter 2000 with 512 megabytes of RAM—a huge amount for 1995. Even so, the assembly took 30 hours of central processing unit time to produce 210 contigs— unbroken sequences formed from the overlapping shotgun fragments. The gaps between these contigs were closed using a variety of methods to complete the sequence. The lead writer of the Science paper, Robert Fleischmann, recalled the moment when everything came together: “Lo and behold, the two ends joined. I was as stunned as anyone.” Unlike human chromosomes, bacterial DNA is generally in a closed, circular form.
The final result was a genome that was 1,830,137 base pairs long, obtained at an average cost of 48 cents each. This was another breakthrough for Venter. As he himself remarked at the time: “People thought [that bacterial genomes] were multiyear, multimillion-dollar projects. We’ve shown that it can be done in less than a year and for less than 50 cents per base.” The consequence, he noted, was that “it’s opened the floodgates.” TIGR itself went on to sequence dozens more bacteria—discussed in chapter 14—and others soon followed in its footsteps. But Venter was keenly aware of even broader implications. The Science paper’s peroration suggested various areas where the whole-genome shotgun approach could be usefully applied. And as usual, Venter saved his most provocative thought for the last, throwaway line: “Finally, this strategy has potential to facilitate the sequencing of the human genome.”