DNA is a molecule that does two things. First, it acts as the hereditary material, which is passed down from generation to generation. Second, it directs, to a considerable extent, the construction of our bodies, telling our cells what kinds of molecules to make and guiding our development from a single-celled zygote to a fully formed adult. These two things are of course connected. The DNA sequences that construct the best bodies are more likely to get passed down to the next generation because well-constructed bodies are more likely to survive and thus to reproduce. This is Darwin’s theory of natural selection stated in the language of DNA.
The story of DNA as hereditary material is well known. We all know that, in the middle of the twentieth century, the American James Watson and the Englishman Francis Crick discovered the double-helix structure of DNA. It is this double helix that makes up our genes. It is this DNA molecule that is packaged into our eggs or sperm and that is inherited by our children, making them resemble us.1
The story of how biologists came to understand the way DNA helps to direct the construction of our bodies is less well known. Somehow our DNA tells our cells how to make hemoglobin, collagen, and thousands of other molecules—and how to make the human form of these molecules, not, say, the cat form, which is slightly different. How does the information that is encoded in our DNA get read by our cells, specifying the structure of the many thousands of different molecules that make us up? This is, roughly speaking, the problem of the genetic code. And this code was cracked during the 1950s and 1960s in some of the most profound and beautiful work in the history of biology.
Matthew Cobb tells this story in his latest book, Life’s Greatest Secret. Cobb, a professor of zoology at the University of Manchester, is a working geneticist. He is also a student of the history of science who has written several previous books on the history of biology. Life’s Greatest Secret is aimed at the general reader who may have only a passing familiarity with biology, much less with the detailed molecular mechanics of how DNA does what it does. The book serves as a useful primer for those interested in the brave new world of genetic intervention made possible by the rise of biotechnology. But Cobb’s book will also be of interest to professional scientists as it recounts events in one of the most transformative periods in the history of science: the rise of a molecular understanding of life.
Despite its dense historical detail, Life’s Greatest Secret is an absorbing and, in places, thrilling book. The race to crack the genetic code is a story with considerable drama and it unfolds remarkably lucidly in Cobb’s telling.
You might not have noticed that I used the language of information when discussing DNA above. The “information” that is “encoded” in DNA gets “read” by cells. You likely didn’t notice because this is now a nearly reflexive way of talking about DNA, even in popular culture. It’s just obvious to us that DNA stores information—for curly hair or blue eyes—and it’s natural to think of it as an information storage device much like the hard disk of a computer. Yet one of Cobb’s main points is that this is a remarkably recent way of thinking about biology.
The rise of this style of thinking had everything to do with what was happening in other fields of science during and immediately after World War II. This period saw the emergence of two new sciences that focused on information. Claude Shannon and others articulated information theory, which quantified the amount of information that flows from sender to receiver during, say, electronic communication. And Norbert Weiner elaborated cybernetics, which formalized ideas like feedback loops, especially negative feedback loops. (A thermostat involves a feedback loop: a setting on a thermostat affects a room’s temperature and the temperature then affects the thermostat.)
Following these developments, some scientists grew excited at the prospect that the mathematical abstractions developed in these fields might provide a radical new way to think about life. Organisms might not look like the stuff of equations that describe the flow of information; but the new sciences hinted that they might well be. Information thinking, Cobb claims, played a crucial part in helping to define what came to be called the “coding problem,” a problem that dominated biology in the 1950s and 1960s.
To grasp the coding problem, we must first understand that our hereditary material, DNA, is a long molecule made of two strands that wind around each other, forming the double helix described by Watson and Crick in 1953. On one strand, four different chemicals are found: adenine, thymine, guanine, and cytosine, abbreviated A, T, G, and C. In principle, these DNA letters (also called “bases”) can occur along the strand in any order, for instance, AAGCTG…. The other strand of DNA carries a sequence of chemicals denoted by letters that “matches” the first strand: A on one strand always pairs with T on the other strand and G always pairs with C. So, in our example, the second strand would read TTCGAC…. The human genome, the total of all our genetic information, is made up of several billion of these pairs of DNA letters.
Most of our bodies are not made of DNA. Instead, to a considerable extent, we’re made of a different class of molecule called proteins. For example, the beta-globin (part of hemoglobin) found in your red blood cells and the collagen found in your skin are proteins. A tremendous amount of work during the 1930s and 1940s, ably surveyed by Cobb, revealed that our genes somehow specify our proteins. Roughly speaking, each gene—which turns out to be a stretch of DNA a few thousand letters long—specifies one protein.2 Thus the thousands of genes in the human genome somehow encode the thousands of different proteins that constitute much of your body.
So, finally, what are proteins? Proteins are long molecular chains made of many individual links called amino acids. Organisms use twenty different kinds of amino acid to build their proteins. Your beta-globin is a sequence of 146 amino acids in a particular order. If you were to switch any one amino acid in a protein with another amino acid, things might go terribly wrong. As Cobb notes, the normal and sickle-cell forms of beta-globin—the latter causes sickle-cell anemia—differ by only a single amino acid.
We can now state the coding problem simply: How does the sequence of A, T, G, and C in DNA determine the sequence of amino acids in proteins? This code, whatever it is, is the code of life. It is the precise physical connection between what we inherit from our parents and how our bodies are built.
This coding problem was formulated remarkably quickly after the discovery of the structure of DNA. Indeed, two weeks after the double-helix discovery but before the Watson-Crick paper was published in Nature, Crick wrote as follows to his son:
Now we believe that the DNA is a code. That is, the order of the bases (the letters) makes one gene different from another gene (just as one page of print is different from another).
Crick’s letter was auctioned in 2013 for $6 million.
Several weeks after Watson and Crick published their double-helix paper, they wrote, “The precise sequence of the bases is the code which carries the genetical information.” By 1957, Crick was emphasizing that by “information” he meant “the specification of the amino acid sequence of the protein.” These statements, Cobb says, are among some of the earliest explicit usages of the language of information by biologists. (The physicist Erwin Schrödinger had used somewhat similar language earlier.)
More important, the race to crack the code was on.
As biologists were soon to realize, that the coding problem could be stated simply did not mean that it could be solved easily. Much of Cobb’s book is given over to the two main approaches that were taken: theory and experiment.
Almost immediately after Watson and Crick announced the double helix, a good many mathematicians and physicists began offering possible solutions to the coding problem. The essence of their approach was combinatoric. Somehow the four letters in the language of DNA need to specify the twenty amino acids in the language of protein. This fact alone let theorists rule out some imaginable coding schemes. It’s obvious, for example, that individual letters of DNA taken singly can’t specify amino acids, since this scheme could only encode four amino acids. It was less obvious that theory could succeed in the positive task of finding the correct code. This didn’t seem to dampen theorists’ enthusiasm.
The physicist George Gamow, for example, fired off letters to biologists suggesting a “diamond” model in which DNA wrapped itself around a cylindrical surface, with the diamond-shaped gaps that appear between nearby bases on adjacent loops of the DNA helix somehow coding for amino acids. A calculation showed that this scheme could encode exactly twenty amino acids. If you can’t picture Gamow’s diamond model, don’t worry. Like many of the schemes that theorists proposed, this one proved not only fantastically clever but fantastic. Crick himself conceived a different clever model that could encode exactly twenty amino acids. It also proved wrong.
Fortunately, speculation about the code didn’t occur in an empirical vacuum. Some possible coding schemes, for example, placed constraints on which amino acids could occur next to each other in a protein. But when biochemists characterized many proteins from many species they found that a given amino acid might be followed by any of the twenty amino acids.
In addition, it was becoming clearer that DNA was not a physical template on which proteins got built. DNA did not even interact directly with the protein that it encoded. Instead, an intermediate molecule seemed to be involved. This intermediate proved to be RNA, a close cousin of DNA. Among other differences, RNA features a different chemical—abbreviated U—instead of T and is typically single-stranded, not double-stranded like DNA. Biologists soon realized that the sequence of letters on one strand of a DNA double helix encoded for a matching sequence on an RNA molecule. If a DNA sequence reads AAGCTG, the matching RNA sequence reads UUCGAC. It was this RNA sequence that then (somehow) determined the sequence of amino acids in a protein.
So by the mid-1950s biologists had realized something big: information in organisms flows from DNA through RNA to protein. Crick called this new idea the “central dogma” of biology.3 Information can get from DNA into proteins but not the other way around. Incidentally, the central dogma was the final nail in the coffin of an idea associated with the French naturalist Jean-Baptiste Lamarck in the early nineteenth century: the inheritance of acquired characteristics. Whatever happens to you during your lifetime of experiences might well affect your body, but there’s no way for these effects to be transmitted from your body’s proteins back into your DNA, shaping what gets passed on to your children.
Despite these developments, the code itself remained undeciphered. Just which letters of DNA code accounted for just which amino acids? Biologists were no nearer to the answer than they were immediately after the discovery of the double helix. Indeed by 1959 Crick lamented that the coding problem had entered a “confused phase.”
The confusion would soon disappear. But its resolution would not involve the clever calculations of the theorists. Nor would it involve the usual suspects, a small circle of brilliant biologists that orbited the yet-more-brilliant Crick. Instead, the code would be cracked by an obscure team: Marshall Nirenberg and Heinrich Matthaei of the National Institutes of Health in Bethesda, Maryland. Nirenberg, the older of the pair, was so unknown that his application to a conference on the genetic code was rejected in 1961. As Cobb puts it, “Ironically, while the great and the good of molecular biology were talking about the genetic code, Nirenberg and Matthaei were cracking it.”
Nirenberg and Matthaei’s approach was beautiful. It was also almost brutally direct. They used artificial RNA sequences that carried the same letter over and over—UUUUUUU…—and asked what kind of protein got made. (This was done in a test tube. The resulting protein did not need to be one that is found in nature.) The answer, in the case of UUUUUUU…, was a protein that carried the amino acid phenylalanine and nothing else. The code was giving way.
Further experiments, including an extraordinarily clever one designed by Crick and Sydney Brenner, soon showed that the code was “triplet.” Every three letters of DNA specifies one amino acid. By 1967, variations on Nirenberg’s experiment, performed by Nirenberg and Matthaei themselves as well as by Severo Ochoa, Gorind Khorana, and others—replete with some technical setbacks and a few errors—allowed experimentalists to decipher the entire genetic code.4
Finally, given only a sequence of letters in DNA, biologists could say exactly what protein would result. Nirenberg was awarded the Nobel Prize in 1968. Upon announcement of the prize, a congratulatory banner was hung in his laboratory reading “UUU are great Marshall.”
The last third of Life’s Greatest Secret is devoted to bringing the history of the molecular side of genetics up to date. Of the many discoveries that followed the cracking of the genetic code, perhaps the most fundamental was the finding that the code is nearly universal across all life on earth. (Some minor variants on the code exist.) This finding is of deep evolutionary significance. All of us—bacteria, fungi, plants, and people—share the same code because we all share an ancestor that lived billions of years ago and that employed this code.
It’s also reasonably clear why the code has remained fixed through this vast stretch of evolutionary time. If it were to change, with, say, GCA encoding something besides the usual amino acid alanine, the structure of hundreds or thousands of proteins would suddenly and simultaneously change, a certain formula for disaster for any organism that tried it. While there’s no obvious physical or chemical reason why certain letters of DNA encode certain amino acids, once life settled on a code early in evolutionary history, it couldn’t be changed without catastrophic consequences. Crick called this the “frozen accident” hypothesis.
Work over the last few decades also revealed that not all DNA codes for protein. In human beings, 98 percent of our genome is “noncoding” DNA. Some of this DNA seems to do nothing whatever. Other regions of noncoding DNA act in gene regulation, that is, they help to determine when and in which cells a gene will produce its protein product. In addition, many species, including human beings, have split genes: one stretch of DNA might code for the first part of a protein, immediately followed by a stretch of DNA that does little or nothing, immediately followed by a stretch of DNA that codes for the remainder of the same protein.
In the last part of Life’s Greatest Secret Cobb turns to developments of social significance that followed from the cracking of the code particularly and advances in molecular genetics generally. Two are of special importance: the creation of genetically modified (GM) crops and the attempt to cure human genetic disease.
Biologists have had great success in using DNA technologies to create crops that produce what are deemed desirable proteins. These engineered proteins typically confer resistance to pests or to herbicides and allow considerable increases in yield. Despite lingering popular concerns, especially in Europe, about “Frankenfoods,” the transformation of agriculture by genetic technologies is mostly a fait accompli in America. As Cobb notes, “in 2014, 94 per cent of US soybean crops were GM, as were 93 per cent of corn crops, 95 per cent of sugar beet crops and 96 per cent of cotton.”
By contrast, physicians have had mixed success in using genetic technologies to cure inherited diseases in human beings. (These interventions are usually intended to inject a normal, healthy copy of a gene into a patient’s diseased tissue, such as the liver, not into the patient’s eggs or sperm. As Cobb notes, the genetic modification will not, therefore, affect future generations.) Genetic technologies have not, so far, transformed medicine in the same way that they have agriculture.
This may be about to change. Though the subject has suffered from enough hype to make anyone skeptical—remember how we were told in the 1990s that the Human Genome Project would revolutionize medicine?—there are reasons for thinking that new “gene-editing” technologies may finally open the door to more effective treatment of genetic disease. These new technologies are generally referred to as CRISPR (pronounced “crisper”) or, more accurately, the CRISPR-Cas9 system. The CRISPR technique is complex but, at base, involves a genetic trick that recognizes a particular DNA sequence in an organism (such as a mutant gene), cuts it out of the organism’s double helix, and then allows it to be replaced by an alternative DNA sequence (such as a normal, healthy gene). CRISPR has been used successfully in many species and is generally very efficient. While hurdles remain, the recent discovery of these new techniques (they were announced over the last five years) may ultimately bring about major changes in medical treatment, as Cobb rightly emphasizes.5
While these more modern developments in molecular genetics are certainly important, the last third of Cobb’s book can’t compete with the rest. These later chapters are in places a bit rambling and sometimes read more like a textbook and less like the historical thriller of earlier chapters. Life’s Greatest Secret might have been better off without this material. But it would be churlish to make too much of that. On the whole Cobb tells his story beautifully and his book is a pleasure to read. Packed with fascinating detail, Life’s Greatest Secret is a major accomplishment, particularly for an author who is also a practicing scientist. Though I like to think that I have a good grasp of the history of genetics and evolutionary biology, I was repeatedly surprised by events in Cobb’s tale.
Several major themes emerge from Life’s Greatest Secret. The first concerns the influence of information theory and cybernetic thinking in biology. These disciplines, Cobb concludes, had an important part in twentieth-century biology “but not in the way in which their partisans might have hoped for.” In the end, the information sciences provided biologists with loose but useful metaphors and analogies, a language that allowed scientists to think and speak in new ways. But the high-powered mathematics of these fields proved mostly impotent in biology. No one, for instance, used Shannon’s equations to say anything especially interesting about organisms. (A fact that didn’t surprise Shannon himself, who was skeptical all along of this attempted use of his theory.) The history of science, like the history of anything, is characterized by considerable nuance, and one subtlety is that the information sciences were, in one sense, critical to the rise of modern biology and, in another sense, beside the point.
A second theme concerns the respective roles of theory (of any sort) versus experiment in biology. In the early 1960s, mathematicians confidently declared that “it will be interesting to see how much of the final solution [to the coding problem] will be proposed by mathematicians before the experimentalists find it.” As Cobb concludes, the “answer…was simple: not one single part of it.”
The interesting question is why theory failed here. Part of the answer, as Cobb emphasizes, is related to Crick’s idea of the frozen accident. The genetic code seems at least partly arbitrary. It represents a half-decent arrangement arrived at by the imperfect, tinkering process of evolution by natural selection and, once settled on, it couldn’t be “improved,” or made somehow more systematic. In such a situation theory is likely useless.
I suspect there’s another, related, reason that theory contributed so little to cracking the code. There was, at bottom, a mismatch between the nature of the problem and the nature of much biological theory. Successful theory in biology typically plays a different part than does successful theory in, say, physics. Theory in biology often guides thought, or trains intuition, or points to patterns that might hold approximately in nature. Only rarely does biological theory provide the essentially exact results that physicists are accustomed to. (And in biology approximate results, or even rules of thumb, are often more useful than exact results.) This kind of broad-stroke theory doesn’t provide much help with a problem as specific as the coding question.
A rough analogy captures these kinds of concerns. Mathematical theory might tell you something interesting and general about combination locks: for example, that they should require a sequence of three or more numbers to prevent a would-be thief from opening them in a few random tries. But place a particular combination lock before a theorist and he’s probably no better than the rest of us at opening it.
Finally, and perhaps most important, Life’s Greatest Secret highlights the power of the beautiful experiment in science. Though Cobb pays less attention to this subject than he might have, the period of scientific history that he surveys was the golden age of the beautiful experiment in biology. Biologists of the time—including Nirenberg with his UUU, Crick and Brenner with their triplet code work, and others including Matthew Meselson, Franklin Stahl, and Joshua Lederberg—were masters of the sort of experiment that, through some breathtakingly simple manipulation, allowed a decisive or nearly decisive solution to what previously seemed a hopelessly complex problem. Such experiments represent a species of intellectual art that is little appreciated outside a narrow circle of scientists.
Ironically, the cracking of the genetic code, together with other developments, has ushered in a very different era in biology, that of big data. Computers now burst at the seams with DNA and protein sequences that derive from the whole genomes of thousands of species sequenced by automated machines. Many biologists use sophisticated statistics in an attempt to infer patterns from these data. Increasingly, biologists seem drawn to such inference, however indirect, and fewer seem captivated by the ideal of the decisive experiment. These indirect approaches have certainly yielded valuable insights and it would be absurd to doubt that they will continue to do so. Big data provide important new tools to biology and medicine. But the larger lesson of Life’s Greatest Secret is one that may be worth remembering. When scientists require definitive answers, not merely suggestive patterns, they require experiments that are decisive and, if all goes well, beautiful.
DNA: A Metaphor? July 14, 2016
The story of the discovery of the structure of DNA was told by Watson himself in his best-selling memoir, The Double Helix (Atheneum, 1968). Crick later recounted his scientific career, including the discovery of the double helix, in his own memoir, What Mad Pursuit (Basic Books, 1988). Maurice Wilkins and Rosalind Franklin also made important contributions to the discovery of DNA’s structure. ↩
This one gene–one protein idea isn’t quite accurate. But it’s often accurate and will serve for present purposes. Genes also vary considerably in length. ↩
Crick confessed later that he hadn’t fully understood the connotations of the word “dogma” when he coined the misleading phrase “central dogma.” Unlike most dogmas, the central dogma of molecular biology is of course supported by mountains of data. ↩
Because the code is triplet and any of four letters might appear at a given site in DNA, there are 4 x 4 x 4 = 64 different DNA triplets that need to encode only twenty amino acids. Consequently, the code is redundant: more than one triplet can encode the same amino acid. The code is not, however, ambiguous: given any triplet, one knows exactly which amino acid will appear. ↩
For more details, see “The Age of the Red Pen,” The Economist, August 22, 2015. This article also considers the controversial possibility that CRISPR could be used to edit the “germ line” of human beings, i.e., to make changes to DNA that are inherited by future generations. ↩