It was back in 1947 that a remarkable scientist called Margaret Dayhoff used punched-card business machines to calculate molecular energies of organic molecules. The scale of these computations made the use of traditional hand-operated calculators infeasible. Dayhoff’s conceptual leap to employing protocomputers as an aid showed daring and doggedness—a calculation typically took four months of shuffling punched cards around—that was to prove a hallmark of her later career in the world of DNA and proteins.
One of her main spiritual heirs and a key figure in the bioinformatics world, David Lipman, has no doubts about her importance, telling me that: “she was the mother and father of bioinformatics.” He bases this view on the fact that “she established the three major components of what a bioinformaticist does: a mixture of their own basic discoveries with the data, which are biological discoveries; tool development, where they share those tools with other people; and resource development. She did all three, and she did incredibly important things in all three.”
As the long list of her publications indicates, her main interest was in the origin of life. It was the research into the evolution of biological molecules that led her in 1961 to begin a lifelong study of the amino acid sequences that make up proteins. Since proteins form the building blocks of life, their amino acid sequences have changed only slowly with time. The reason is clear: any major difference in sequence is likely to cause a correspondingly major change in a key biological function, or the loss of it altogether. Such an alteration would often prove fatal for the newly evolved organism, so it would rarely be propagated to later generations. By contrast, very small changes, individually without great implications for biological function, could gradually build up over time to create entirely new functions. As a result, when taken together, the slowly evolving proteins provide a rich but subtle kind of molecular fossil record, preserving vestiges of the very earliest chemical structures found in cells. By establishing which proteins are related and comparing their differences, it is often possible to guess how they evolved and to deduce what their common ancestor was hundreds of millions of years ago.