Bioinformatics and Computing History

  February 02, 2022   Read time 3 min
Bioinformatics and Computing History
Unlike DNA, with its neatly paired double helix, the history of bioinformatics involves many strands, often woven together in complex ways. If the field has a point of departure, it can perhaps be traced to a moment right on the cusp of computing history, and even before Watson and Crick’s momentous paper.

It was back in 1947 that a remarkable scientist called Margaret Dayhoff used punched-card business machines to calculate molecular energies of organic molecules. The scale of these computations made the use of traditional hand-operated calculators infeasible. Dayhoff’s conceptual leap to employing protocomputers as an aid showed daring and doggedness—a calculation typically took four months of shuffling punched cards around—that was to prove a hallmark of her later career in the world of DNA and proteins.

One of her main spiritual heirs and a key figure in the bioinformatics world, David Lipman, has no doubts about her importance, telling me that: “she was the mother and father of bioinformatics.” He bases this view on the fact that “she established the three major components of what a bioinformaticist does: a mixture of their own basic discoveries with the data, which are biological discoveries; tool development, where they share those tools with other people; and resource development. She did all three, and she did incredibly important things in all three.”

As the long list of her publications indicates, her main interest was in the origin of life. It was the research into the evolution of biological molecules that led her in 1961 to begin a lifelong study of the amino acid sequences that make up proteins. Since proteins form the building blocks of life, their amino acid sequences have changed only slowly with time. The reason is clear: any major difference in sequence is likely to cause a correspondingly major change in a key biological function, or the loss of it altogether. Such an alteration would often prove fatal for the newly evolved organism, so it would rarely be propagated to later generations. By contrast, very small changes, individually without great implications for biological function, could gradually build up over time to create entirely new functions. As a result, when taken together, the slowly evolving proteins provide a rich but subtle kind of molecular fossil record, preserving vestiges of the very earliest chemical structures found in cells. By establishing which proteins are related and comparing their differences, it is often possible to guess how they evolved and to deduce what their common ancestor was hundreds of millions of years ago.

To make these comparisons, it was first necessary to collect and organize the proteins systematically: these data formed the basis of Dayhoff’s famous Atlas of Protein Sequence and Structure, a book first published in 1965. Once the data were gathered in this form, Dayhoff could then move on to the next stage, writing software to compare their characteristics—another innovative approach that was a first for the period. Thanks to this resource and tool development, Dayhoff was able to make many important discoveries about conserved patterns and similarities among proteins.
The first edition of the Atlas contained 65 protein sequences; by the time the fourth edition appeared in 1969, there were over 300 proteins. But the first DNA sequence—just 12 chemical letters long—was only obtained in 1971. The disproportion of these figures was due to the fact that at the time, and for some years after, sequencing DNA was even harder than elucidating the amino acids of proteins. This finally changed in 1977, when two methods were devised: one by Allan Marshall Maxam and Walter Gilbert in the United States, at Harvard; the other by Frederick Sanger in the United Kingdom, at Cambridge. Gilbert and Sanger would share the 1980 Nobel Prize in chemistry for these discoveries. Remarkably, it was Sanger’s second Nobel prize. His first, in chemistry, awarded in 1958, was for his work elucidating the structure of proteins, especially that of insulin, which helps the body to break down sugars.

  Comments
Write your comment