Most science proceeds quietly. Many scientific problems are tackled by one or a few laboratories and the results are published in a journal that the public has never heard of. And even when science does make it into the mainstream press, it’s almost always briefly and after the fact: the study is done and its findings are reported in a few sentences.
The effort to sequence the human genome—to determine the exact order of the “letters” making up the DNA sequence of our species—was never like this. Instead, the press covered the project for years before its completion. And when the project was finally finished, its results were the first ever to be announced from the White House. While much of this attention reflected the possible medical implications of the work—the public was told that the human genome project would cure what ails it—at least some of the attention reflected the drama of the undertaking itself. Sequencing the human genome was no ordinary scientific venture but one that featured big money, big personalities, and a big race.
The race was between two programs, one public (centered at the National Institutes of Health) and the other private (centered at a company called Celera Genomics). As races go, this one was not particularly sportsmanlike. Instead, the contest featured more than its fair share of political intrigue and mudslinging. Now one of the key players—Craig Venter, leader of Celera’s effort—tells his story. (Or, more accurately, his side of the story.) The resulting tale is part autobiography, part popular science, and part an attempt to settle old scores. Apparently, there are plenty to settle.1
Venter’s youth offered little reason to expect success in life. Growing up in Eisenhower-era California, he was a wild child and an abysmal student. By adolescence, he seemed destined to a life as a beach bum. Then Vietnam intervened. Joining the navy, he became a medic and endured the Tet Offensive as well as the horrors of performing triage. His experiences during the war, which included an aborted suicide attempt, seemed to transform Venter and he returned to the US determined to make something of himself.
Attending community college and then the University of California, San Diego, he settled on biology. He remained at San Diego to perform graduate work on how hormones affect cells and found, to his surprise, that he was a formidable researcher. He rapidly racked up an impressive number of published articles. Following several academic positions, in 1983 Venter landed at the NIH in Bethesda, a near paradise for biologists: research money was abundant and grant applications unnecessary.
During his time there, Venter’s work gradually shifted to DNA. Following a series of squabbles with James Watson and other NIH leaders—disputes that foreshadowed his later, bitter feuds with federal scientists—Venter departed in 1992, choosing instead to lead research at a private institute where he was free to pursue his increasingly ambitious projects on DNA sequencing. In the end, of course, Venter set his sights on the human genome, opting to compete directly with the public program. He also decided to decode the genome using a controversial method that differed radically from that favored by scientists in the public effort. The race that ensued makes up much of the story of A Life Decoded. By 2000, with the completion of the project, the former beach bum found himself on the cover of Time magazine, along with Francis Collins, who led the public program.
The portrait of Venter that emerges from A Life Decoded is fascinating, if less than wholly attractive. Confident, domineering, and a risk-taker, Venter enjoys his reputation as the bad boy of biology. Most of all, he seems determined to win, whether sequencing or sailing (his avocation). Though Venter longs to be seen as one of the greats in the history of science, A Life Decoded makes it clear he is only partly a scientist. He is also part entrepreneur and part PR man. (Alas, he is also only part writer. As Venter acknowledges, he wrote text and then hired a reporter to help “trim and reorganize” his work. The result is some annoyingly bumpy prose.)
Though relentlessly self-serving, Venter manages, almost despite himself, to produce a book that’s engaging and, in places, charming. A Life Decoded is a tale told by an ego of epic proportions, but once this fact is accepted, the drama of Venter’s narrative takes hold and even his bravado becomes perversely, if mildly, entertaining. In any case, the personality on display in the book—combative, with healthy doses of chutzpah and showmanship—surely had much to do with Venter’s success in the ruthlessly competitive world of Big Science, a world that looks more like Russian capitalism than it does popular pictures of the noble pursuit of truth. While the partisan spin that runs through A Life Decoded won’t be of much lasting interest, the science that underlies the human genome story will be.
DNA is the material that gets passed from parents to offspring and which partly explains why offspring resemble parents. A long molecule, DNA is a string of chemicals, each of which can be represented by a letter. Unlike English, DNA uses only four letters—A, T, G, and C. Like the information carried by this sentence, the information carried by DNA depends on the precise sequence of letters. Genes are particular regions of DNA that have a special role: they tell a cell to make a certain kind of molecule, namely a protein. One gene sequence (say, AATTCGGTC…) tells the cell to make one kind of protein, while a different sequence (TTCGCTAGC…) tells the cell to make a different kind of protein. But not all DNA functions as genes: much of our DNA is filler that sits between genes.
The sum total of all this DNA, genes and filler, is known as the genome. And genomes are very large. The human genome, for instance, is about three billion letters long and includes around 30,000 genes. These three billion letters of DNA are not, however, all strung together in a single molecule. Instead, the human genome is divided into twenty-three different molecules of DNA and each of these molecules resides on a small cellular body called a chromosome. So chromosome 3, say, carries one particular long DNA molecule and includes a distinct set of genes; chromosome 4 carries a different long DNA molecule and includes a different set of genes, and so on.
Until recently, decoding the sequence of letters in any stretch of DNA required tedious work in the laboratory and biologists typically confined themselves to sequencing only one or a few genes. In the late 1980s a new technology—automated DNA sequencing—promised to change all this, simplifying and speeding the process. Inject DNA from an organism into the machine and, by means of chemistry and lasers, the device would reveal the desired DNA sequence.
As automated sequencing improved, biologists raised their expectations and some began considering sequencing not merely this or that individual gene but entire genomes. In the Nineties, a group of biologists convinced the federal government to fund an extensive effort to sequence the entire human genome, the largest concerted undertaking in the history of biomedical research and one that would ultimately cost about three billion dollars.
To decode the genome, leaders of the public project, which was based at the NIH—but also included scientists at the Department of Energy as well as those funded by the Wellcome Trust in Britain—settled on a two-step strategy. In the first, the genome would be broken into large fragments of DNA and the physical location in the genome of each fragment would be painstakingly ascertained. A particular fragment might, for instance, sit at the tip of chromosome 17. Once a large collection of fragments had been mapped, researchers would choose a subset of fragments that showed little overlap with one another. This would ensure that the project was left with a manageable number of fragments that, taken together, covered the whole genome. The public project would then move to step two: each DNA fragment in this set would be sequenced. This work would be farmed out to an international consortium of laboratories, each running automated sequencers.
Venter, aggressive and fond of shortcuts, stood this systematic approach on its head. He advocated an alternative approach called whole genome shotgun sequencing. In shotgun sequencing, many copies of the genome are sheared randomly into small pieces. A huge number of these pieces, many of which overlap, are then immediately sequenced using automated machines. A laboratory thus has no idea where the particular piece of DNA it is sequencing resides in the genome. It might derive from chromosome 2 or from chromosome 17, etc. And because the pieces of DNA are generated randomly, some parts of the genome might get sequenced once, some twice, others three times, and so on. (Worse, some parts might not get sequenced at all. But enough pieces are sequenced that this outcome is rare.)
After decoding many thousands of such pieces, one faces the daunting task of correctly stitching them all together into a complete genome. This so-called genome assembly step is performed using computers that search for regions of sequence overlap between pieces of DNA: if the right end of one piece of DNA has the same sequence of letters as the left end of another piece of DNA, they can be overlaid to form a single, longer sequence. By repeating this process over and over, sophisticated computer algorithms can stitch together an entire genome—at least in principle.2
The shotgun method thus required both extensive sequencing and powerful computational capacities. On the upside, it promised faster results than the public approach; one needn’t spend years mapping every piece of DNA to its location in the genome. On the downside, there was no guarantee that the computer algorithms were up to the task of assembling the many pieces of DNA into a seamless whole—potentially leaving Venter with many fragments but no genome.
Venter’s first test of the shotgun method involved not human beings but a tiny microbe, Haemophilus influenzae, which can cause serious infections such as childhood meningitis. And it worked. In 1995, Venter’s private venture, TIGR, unveiled the first complete genome sequence of any free-living species. There could now be no doubt that Venter was a force to be reckoned with in the nascent field of genomics, the sequencing and analysis of whole genomes. Though providing proof in principle for shotgun sequencing, the Haemophilus genome is small—it includes only 1.8 million letters and about 1,700 genes. It remained unclear, therefore, if shotgun sequencing could be ramped up to species having far larger genomes, especially as the computational challenges involved in assembling pieces into a whole explodes as genome size increases.
Venter, who by the late 1990s was heading a new private venture called Celera (from the Latin for swiftness), thus turned his attention to a far fancier creature, the fruit fly Drosophila melanogaster. Drosophila has a large genome—1.2 billion letters long—and encodes over 13,000 genes. Importantly, Drosophila provided a test not only of the shotgun method’s ability to decode a large genome but of its accuracy. Previous work by Drosophila geneticists had generated a great deal of high-quality DNA sequence that could be compared to Celera’s new sequence. Again, Venter’s approach worked and Celera unveiled the complete genome sequence of the fly.
Confident of the power of shotgun sequencing—and convinced that the public project’s approach was needlessly slow—Venter finally turned to the human genome. By the time Celera began actual sequencing of human DNA, in September 1999, the public project had already decoded about 25 percent of the genome. Celera started with its own samples of DNA from five people, three women and two men (one of whom was Venter himself).3
Celera’s most pressing problems were technical. To decode the three billion letters of the genome, Venter led a factory-style operation that featured a new generation of automated sequencers, which were initially unreliable. After the bugs in these machines were worked out, Celera was, on a typical day, simultaneously running three hundred sequencers, each machine costing about $300,000. Celera consequently decoded a staggering fifty to one hundred million letters of DNA per day.
As the private and public initiatives progressed, relations between the projects deteriorated badly. Several conflicts arose. One was that the public program was, each day, depositing all new DNA sequences into a publicly accessible database. Celera, as a private venture, had no obligation to do so and it did not. But as Venter acknowledges in A Life Decoded, Celera nonetheless helped themselves to the public data, using it whenever it proved convenient (and it appears it often did). Worse, serious concerns about Celera’s impending intellectual property claims arose among scientists in the public program. Was Venter planning to charge researchers exorbitant fees to gain access to the human data? Was he planning to patent the human genome?
Mostly, though, it’s hard to escape the conclusion that bitter relations between the programs reflected insecurity. Although each party loudly proclaimed the superiority of its approach, each was terrified that the other might first announce completion of the genome. After a series of failed negotiations between the leaders of the two programs, President Clinton intervened and, behind closed doors, essentially forced the parties to collaborate. The political pressure worked, there were renewed conversations between the programs, and, on June 26, 2000, the genome race ended in an anticlimactic tie. Clinton presided over an East Room ceremony in which Venter and Collins jointly announced the sequencing of the human genome. Predictably but absurdly, the ceremony seemed a love fest, with both parties projecting a spirit of cooperation. The true extent of cooperation was apparent the following year when the groups decided to publish their findings separately.4
Although there has been considerable debate about whether shotgun sequencing really worked with human beings (to what extent did Celera rely on public data to assemble the genome?), there has also been speculation that Celera pulled its punches (did Venter cooperate with the public project for purely political reasons, e.g., to avoid embarrassing the NIH?). In his book, Venter’s own position on the latter issue sometimes seems incoherent. Now and then he implies that Celera could have handily beat the public project. But given that Celera used public data, it’s unclear what this would even mean. (How can you “beat” the runner who hands you the baton?) In any case, history will clearly credit both the public and private initiatives with the accomplishment. And clearly it should.
Genomics did not of course end with the human genome project. Shortly after the project’s completion, Celera sequenced the complete genomes of the mouse and of a mosquito species that transmits the malarial parasite. And as sequencing and computational capabilities improved, genome projects grew both faster and cheaper. Indeed various groups have decoded the complete genomes of several hundred species.
Venter now seems intent on turning the genomic enterprise upside down. Instead of decoding the DNA of naturally occurring species, he hopes to create new ones. Venter’s ambitious goal is to design and synthesize novel genomes that can perform important environmental or industrial tasks like producing alternative energy sources, i.e., “green” biofuels. Instead of extracting increasingly rare fuels from the earth, why not engineer microbes to produce, say, octane from sugar or cellulose?
Venter’s efforts in the new field of “synthetic genomics” have involved several steps. First, he has shown that the genome of one bacterial species can be injected into the cell of another, yielding a living cell that now produces whatever products its new genome tells it to. Ultimately, this technology could be used to allow the injection of artificial genomes into cells. Second, Venter and colleagues have tentatively identified the minimal set of genes required for life. (The number appears to be about four hundred.) Venter hopes that a microbe carrying such a minimalist genome could, in the future, be used as a scaffold onto which new genes are added when organisms are being engineered to perform particular jobs. Third, Venter has shown that he can manufacture a genome in the laboratory.
On January 24, 2008, Venter announced that a research team at his latest private venture, the J. Craig Venter Institute, produced from scratch the entire genome of the bacterium Mycoplasma genitalium. Beginning effectively with bottles of the chemicals represented by DNA letters A, T, G, and C, they used biotechnological trickery to assemble over half a million letters of DNA in the correct order, yielding a man-made Mycoplasma genome. Although this effort produced what is essentially a natural Mycoplasma—not a new species—Venter’s goal is to use the same technology to synthesize novel genomes, genomes that might be put to work in solving environmental problems.5
At this point, synthetic genomics remains mostly a vision and it’s hard to assess its likelihood of success, whether environmental or economic. Indeed, the field could face an all-or-none future: synthetic genomics might revolutionize the production of fuels or it might well fizzle in the face of alternative, and possibly lower-tech, approaches. But Venter’s work at least suggests that any remaining barriers to synthetic genomics are more likely economic than scientific.
Venter devotes little of A Life Decoded to the idea of synthesizing genomes; instead, his focus throughout most of the book is on the biology of natural genomes. Surprisingly, though, he spends little time explaining why this kind of genomics matters. His main attempt involves stand-alone boxes of text that consider the supposed implications of his own genome sequence. These boxes are mostly disappointing. Indeed a few are merely devoted to showing that Venter himself isn’t afflicted by some mutation (e.g., one causing muscle fatigue and cramps), a fact that was already clear from his robust constitution (in high school, Venter was a near-Olympic-level swimmer). If this were the best genomics could do by way of enlightenment, there would seem little cause for excitement.
The irony is that Venter’s obsessive focus on his own genome runs the risk of selling genomics short. For there can be no doubt that the field has important implications for at least two disciplines, medicine and evolutionary biology.
In medicine, genomic data should allow easier detection of genes causing disease. One way this can happen is through so-called association studies. In such studies, we take two groups of people—those suffering diabetes, say, and those not—and we study the types of DNA sequences they carry at thousands of random sites throughout the genome. If most individuals with diabetes share a DNA sequence at one of these sites that differs from the sequence carried by most individuals who do not have diabetes, a gene contributing to diabetes may reside nearby. With a click of a mouse, we can then search genomic databases to see which genes sit near this site.6
In principle, at least, similar sorts of studies might lead to “personalized medicine.” In addition to genetically analyzing who gets a disease, we can genetically analyze who responds best to particular medicines if they have the disease. A physician could then examine your individual genome sequence—which you might carry on a CD—and conclude that, in view of your genetic makeup, drug X is best for you.
Another important way genomics could allow detection of genes causing disease is through use of “microarray expression chips.” This technology lets one determine which of many thousands of genes are expressed, i.e., are producing their protein product, in a particular kind of tissue. We can, therefore, take two kinds of tissue, for example, healthy skin versus cancerous skin, and compare the genes that are producing their products in each. Are some genes expressed only in cancerous skin and not in healthy skin? If so, these genes may play a part in causing the cancer.7
In principle, this technology could also radically change the way some diseases are diagnosed. If we know the “expression profiles” of many different diseases—that is, which genes are and are not expressed in each disease—we could sample diseased tissue from a patient, perform a chip analysis, and determine its expression profile: Does it match that of any known disease?
In evolutionary biology, whole genome sequencing has opened the door to so-called comparative genomics. We can now contrast the entire genomes of many species—say, human beings with our closest relative the chimpanzee—asking how they differ. (The answer, in this case, is not much. Homo sapiens and Pan troglodytes differ at only a little more than 1 percent of their DNA letters. Within genes, these differences are concentrated in DNA that plays a role in immunity, response to pathogens, and reproduction.)
We can also use data from genomes to make inferences about which evolutionary forces drive the divergence of species. There are two main possibilities. One is Darwin’s natural selection: given a supply of random mutations, species might diverge through time as each adapts to its environment. If one DNA sequence helps one species adapt to its environment and a different DNA sequence helps another species adapt to its environment, the two species will diverge genetically. The other possibility is “genetic drift.” If two DNA sequences yield organisms that are equally fit (i.e., have the same chance of survival), one sequence could, just by chance, slowly become more common in one species and the other sequence could become more common in the second species.
The statistical analyses required to determine whether species diverge by natural selection or genetic drift is subtle, but the genomic data required are simple: we compare the sorts of DNA sites that differ between two species with those that differ within the species. While evolutionary biologists used to perform such tests on only one or a few genes, in some cases we can now do so for all the genes in a genome. A recent study, for example, found that about 20 percent of the genes distinguishing two species of fruit fly diverged by Darwinian natural selection. This is a much higher percentage than many evolutionary biologists would have guessed a few years ago. This study highlights what is surely one of the most important consequences of genomics: the sheer volume of data has grown enormously and, with it, our confidence in our empirical tests of various theories.8
Surprisingly, these medical and evolutionary enterprises are not entirely separate. In fact, medical inferences about the likely role of a gene in disease are often informed by evolutionary data. If, for instance, an association study suggests that one of three genes in a small chromosomal region might contribute to a disease, we can use evolutionary databases containing DNA sequences from hundreds of species to find the most similar version of each of these three genes in “model organisms” like the mouse Mus or the fly Drosophila. We can then ask: What happens if, one by one, these genes are deliberately “knocked out” by mutation in these species? Do any of these mutations yield mice or flies that suffer problems that resemble those in the human illness? If so, the relevant gene is a good candidate for the (partial) cause of the disease.
Despite these kinds of potential contributions to medicine and biology, Venter worries that expectations for genomics may run too high among the public. Such worries partly reflect a history of hype. Early claims for the importance of the human genome project were absurdly overblown. Some scientists went so far as to suggest that the project would, in a philosophical way, change our understanding of what it means to be human. (A category mistake of the first magnitude. What it means to be human is of course just what it was before we knew the order of three billion As, Gs, Cs, and Ts.) And in the wake of the human genome project’s completion, one science reporter wrote that “we are living through the greatest intellectual moment in history.”
While Venter can also succumb to hyperbole—he describes the human project as “one of the greatest, most exciting, and, potentially, most beneficial scientific adventures of all time”—more often he’s sensitive to its perils. He emphasizes, for example, that any major medical therapies deriving from genomic findings are probably a long way off. He also makes it clear that the public’s inflated hopes for the human genome project may reflect its belief in a naive genetic determinism. The public must, he argues, understand that our fates are not usually tied tightly to our genes. Instead, environmental and psychological factors may affect health at least as much as do genes. (Venter says his experiences in Vietnam taught him to distrust simple-minded reductionism; considering the survival and recovery of his patients, he concluded that attitudes often mattered as much as physiology.)
Even when genes do matter, a little knowledge can be a dangerous thing. A genetic test might, for example, show that you carry a gene contributing to a disease. For some diseases, you are then very likely to develop the illness. But for other diseases, the risk associated with the gene may be slight or the gene might interact with other genes in unexpected ways, making prediction difficult.9
Scientists can also suffer from unrealistically high hopes, though usually for different reasons than the public. One is that, when embracing a powerful new approach like genomics, scientists sometimes overlook the cost in missed opportunities paid by abandoning other approaches. In the early twentieth century, for example, the rise of genetics was largely responsible for the demise of embryology. Biologists, excited by the rediscovery of Mendelism, often abandoned their research on embryos to take up the study of inheritance. (T.H. Morgan, the father of American genetics, was originally an embryologist.) Embryology recovered only seven decades later when studies revealed the identity of key genes—including so-called Hox genes—that cause embryos to develop properly as well as the (startling) fact that these genes are shared among all animal species, even those separated by half a billion years of evolution.
This is the usual course of science: sciences, like economies, show a kind of creative destruction. And sometimes, as with genetics, the cost is worth it. It was more important to work out the rules of inheritance than of embryology. The same could prove true for genomics. The difficulty, of course, is that in science we can never calculate such opportunity costs in advance since we never know what alternative research programs might discover.
Finally, it’s important to understand what genomics does and does not represent. Some have attempted to portray the rise of genome science as an intellectual revolution. But this both is misleading and misidentifies the locus of genomics’ true importance. Neither Venter nor anyone else has used data from whole genomes to transform fundamentally our view of the world. We can point to no analog of Darwin’s discovery of natural selection or molecular biology’s elucidation of the chemical basis of life. Instead, the rise of genomics is more like the invention of a powerful new microscope: genomics is a tool. This is not to say, of course, that some young Darwin won’t come along and use genomic data to reach radical new insights about the nature of life. Only that it hasn’t happened yet.
March 20, 2008
Some of the same story has been told in previous books. See especially James Shreeve, The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World (Knopf, 2004). ↩
Exactly how these sequencing strategies work is somewhat more complicated than I’ve indicated. For more detail, see Greg Gibson and Spencer V. Muse, A Primer of Genome Science (Sinauer Associates, 2004). ↩
People generally show few differences at the level of DNA sequence. If we line up DNA sequences from two people, they’ll differ at one out of every hundred to one out of every thousand letters. So while “the” genome sequence produced by either the public or private efforts would be a statistical consensus from multiple people, it would provide a reasonable snapshot of the human genome. ↩
As of June 2000, the human genome was not actually fully sequenced and the project’s results were presented as a “draft.” This draft featured many gaps and included about 90 percent of the genome. Most of these gaps have now been filled and at least 92 percent of the genome has been sequenced. ↩
David G. Gibson et al., “Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome,” Science, January 24, 2008 (published on-line at www.sciencemag.org/cgi/content/abstract/1151721). The synthesized bacterial genome was not identical to that of natural Mycoplasma. Instead, Venter’s team, led by his close colleague Hamilton O. Smith, disrupted a single gene to ensure that the new bacterium could not cause infection. For more on Venter’s views on the promise of “synthetic genomics” and “environmental genomics,” see his 2007 Richard Dimbleby Lecture (available at www.bbc.co.uk/pressoffice/pressreleases/stories/2007/12_december/05/dimbleby.shtml). ↩
Association studies involve many statistical complications. False positives, in which a DNA site falsely appears to be associated with a disease, can easily occur. Confidence in the association of a DNA site with a disease obviously increases when different studies independently point to the same site. See Joel N. Hirschhorn and Mark J. Daly, “Genome-Wide Association Studies for Common Diseases and Complex Traits,” Nature Reviews Genetics, Vol. 6 (February 2005). ↩
Such a gene does not necessarily cause the disease. Instead, inappropriate expression of a gene in a tissue might be an effect of another gene, the one that actually causes the disease. Nonetheless, expression studies can provide clues about the biochemical pathways involved in an illness. Note also that chip technology identifying gene expression does not actually detect production of a protein by a gene; instead, it uses an indirect approach to detect production of RNA by a gene; RNA is an intermediary in the production of a protein by a gene. ↩
D.J. Begun et al., “Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans,” PLoS Biology, Vol. 5, No. 11 (November 2007). As the authors emphasize, previous tests of the role of natural selection in evolution often involved genes that were studied because other lines of evidence already suggested a history of Darwinian evolution at the gene. The new population genomic approach avoids this bias by studying an essentially random set of genes. Since the statistical analysis involved is powerful only when certain technical conditions are met, Begun et al. restricted their analysis to a large subset of the Drosophila genome, including over six thousand genes. Since their test is also conservative, more than 20 percent of the genes studied may have diverged by natural selection. ↩
Some science writers have raised important questions about these issues. An article in The Wall Street Journal (Gautam Naik, “As Gene Tests Spread, Questions Follow,” December 13, 2007), for instance, questioned the usefulness of certain genetic tests marketed directly to consumers. And Scientific American (Laura Hercher, “Diet Advice from DNA?,” December 2007) recently questioned the science that allegedly underlies “nutrigenetics,” in which genetic information is used to infer an optimal diet for an individual. ↩