Digital Revolution and Bioinformatics

  February 02, 2022   Read time 3 min
Digital Revolution and Bioinformatics
Around the time that Staden was laying the computational foundations for sequencing, a pioneer across the Atlantic was beginning important work on what was to prove another enduring thread of the bioinformatics story: online databases.

Doug Brutlag, a professor of biochemistry at Stanford University, explained to me how this came about: “We were studying sequences as far back as 1975–1976, that was when I first became aware of the informatics problems of analyzing the data we were getting from sequencing.” One of the central issues that he and his colleagues grappled with was “to try to find out how was best to serve the scientific community by making the sequences and the applications that worked on the sequences available to the scientific community at large. At that time there was no Internet, and most people were exchanging sequences on floppy discs and tapes. We proposed distributing the sequences over what was then called the ARPANET.” The ARPANET was created in 1969 and was the precursor to the Internet.

Brutlag and his colleagues achieved this in part through the MOLGEN (for Molecular Genetics) project, which was started in 1975 at Stanford University. One aim of MOLGEN was to act as a kind of intelligent assistant to scientists working in that field. An important part of the system was a series of programs that was designed to aid molecular biologists in their study of DNA by helping them carry out key tasks using a computer, but without the need to program.

For example, Brutlag and his colleagues described the SEQ analysis system, based on earlier software, as “an interactive environment for the analysis of data obtained from nucleotide sequencing and for the simulation of recombinant DNA experiments. The interactive environment and selfdocumenting nature of the program make it easy for the non-programmer to use.”

The recombinant DNA experiments refer to an important breakthrough in 1973, when a portion of the DNA from one organism was inserted into the sequence of another to produce something new that was a combination of both—the recombinant DNA, also known as genetic engineering. That this was possible was a direct consequence of not just the digital nature of DNA —had an analogue storage process been involved, it is not clear how such a simple addition would have been possible—but also of the fact that the system for storing the digital information through the sequence of As, Cs, Gs, and Ts was generally identical, too. Put another way, the biological software that runs in one organism is compatible with the computing machinery—the cells—in every other. While mankind uses messy and inefficient heterogeneous computer standards, Nature, it seems, has sensibly adopted a universal standard.

The practical consequence of this single platform was the biotechnology revolution of the 1980s. Biotech pioneers like Genentech (for Genetic Engineering Technology) were able to use one organism—typically a simple bacterium—as a biological computer to run the DNA code from another species—humans, for example.

By splicing a stretch of human DNA that coded for a particular protein—say, insulin—into bacteria, and then running this recombinant DNA by letting the modified bacteria grow and reproduce, Genentech was able to manufacture insulin artificially as a by-product that could be recovered and sold. Similarly, thanks to recombination, researchers could use bacteria as a kind of biological copying system. Adding a DNA sequence to a bacterium’s genome and then letting the organism multiply many times generates millions of copies of the added sequence.


  Comments
Write your comment