A special protein called DNA polymerase moves along the DNA sequence to produce a copy letter by letter. Using what are called primers—short sequences of nucleotides from the beginning and end of a particular stretch of DNA—it is possible to make copies of just that section of the genome flanked by the primers. PCR soon became one of the most important experimental techniques in genomics. It provides a way of carrying out two key digital operations on the analogue DNA: searching through huge lists of chemical letters for a particular sequence defined by its beginning and end, and copying that sequence perfectly billions of times. In 1993, Mullis was awarded the Nobel Prize in chemistry.
MOLGEN’s SEQ software included a kind of software emulator, allowing scientists to investigate simple properties of various combinations of DNA code before implementing it in live organisms. In many ways, the most important part of SEQ was the DNA sequence analysis suite. There was a complementary program for analyzing proteins, called PEP, similar to Margaret Dayhoff’s early software. Similarities in DNA produce protein similarities, though protein similarities may exist even in the absence of obvious DNA matches.
Different DNA can produce the same proteins. The reason is that several different codons can correspond to the same amino acid. For example, alongside AAA, the codon AAG also adds lysine, while CAA has the same effect as CAG, coding for glutamine. If one sequence has AAA while the other has AAG, the DNA is different, but the amino acid that results is not. When Dayhoff began her work, there were so few defined nucleotide sequences that most similarity searches were conducted on the relatively more abundant proteins. By the time SEQ was written, however, there were many more known DNA sequences, and the new techniques of Sanger and Gilbert were beginning to generate a flood of them.
MOLGEN was made available to researchers on the Stanford University Medical Experimental computer for Artificial Intelligence in Medicine (SUMEX-AIM). As Brutlag explains: “That computer was intended specifically for artificial intelligence research and medicine. In order to make it available to many of their collaborators they had that computer available on what was then the ARPANET, which let other collaborators that had access to the ARPANET access it.” Brutlag and his colleagues were able to take advantage of this existing infrastructure to create the first online molecular databases and bioinformatics tools: “so we made use of that and we got permission from the developers of SUMEX-AIM to make our programs and databases available to the molecular biology community.” But they wanted to go further.
“We had tried to get a central resource funded before from NSF”—the National Science Foundation, the main government funding body for science in the United States. “We proposed to take programs from many individuals around the world that were written in different [computer] languages and to put them onto one kind of computer.” These programs would then be made available over the ARPANET. But the NSF was not interested. “They said well, this is the sort of thing that should really be done in the commercial sphere, and they didn’t fund us,” Brutlag recalls.