Genes, Peoples, and Languages
Luigi Luca Cavalli-Sforza’s latest book summarizes the life work of this fascinating polymath, who for the last fifty-five years has been developing ingenious methods to understand the history of everybody. I first encountered his methods by chance thirteen years ago while I was browsing my weekly copy of the international scientific journal Nature. Virtually all of the journal’s articles were written in technical language incomprehensible to laypeople, and indeed to most scientists. There were studies of the high-Tc superconductor YBa2Cu3O7_s, a c-erb-A binding site, copia element genome reshuffling, corticofugal feedback, and other things that I had never heard of in my career as a scientist.
Halfway through the journal, sandwiched between articles on bimolecular chirality and on African finch bill size polymorphism, I came across something different. Five authors with Italian names, which I took to be pseudonyms, claimed to have extracted the surnames of 10,473,727 Italian telephone subscribers from the phone directories of ninety-one Italian provinces. The authors then allegedly analyzed the surnames by the usual methods applied to bona fide scientific data, such as fitting the names to equations, drawing graphs, and applying statistical tests. The paper was evidently one of those hilarious spoofs of science that Nature editors occasionally run. Cited in the list of references at the end of the paper were other articles by some of the same authors, apparently constituting other spoofs. Into respectable journals like The Annals of Human Genetics those pseudonymous Italians had managed to slip purported analyses of surnames extracted from Sardinian residential electric bills.
As I reread the Nature paper, I gradually realized that it was no joke but instead a brilliant, serious study. Written histories describe migrations, but rarely can say exactly how many people moved, where they originated, and where they ended up. The authors of the apparent spoof had figured out how to extract answers to those questions from local lists of surnames. As a glance at any telephone directory will show, there are a few common surnames (like Smith in the US) and thousands of rarer ones, but a name’s frequency differs between localities. Hence if migration is occurring between two localities with different name frequencies, the relative frequencies at the two localities differ in a way that lets one calculate the migration rates by means of a mathematical analysis. For instance, the relative frequencies of the names Garcia and Smith differ between the Mexico City and Los Angeles phone books, thereby reflecting Mexican immigrants to the US and American immigrants to Mexico (many of them named Garcia and Smith, respectively).
By analyzing the names in Italian phone directories, the authors of the Nature paper extracted patterns of migration during thousands of years of Italian history. Dante Alighieri had already written around the year AD 1305, “As to the ancient vulgar tongue …I maintain that Italy is divided into two parts, the right and the left. And if anyone should ask what is the dividing line, I answer that the Apennines are the watershed.” The authors’ maps confirmed Dante’s intuition by computer: today, surnames show that several of Italy’s sharpest boundaries to migration still coincide with the Apennines. Other surname boundaries delineate the area of western Sicily settled by Phoenicians and Carthaginians in the eighth century BC, the concentration of ancient Greek settlements in northeastern Sicily in the seventh century BC, and the Albanian settlements of the fifteenth century AD in southern Italy. Incredible as it may seem, evidence of those ancient communities lives on in the surnames of modern Italians inhabiting those areas—Italians of whom many are the genetic descendants of those ancient colonists.
The senior author of the papers on surnames was the Stanford professor Luigi Luca Cavalli-Sforza, esteemed by other scientists as the world’s leading expert on human population genetics. It would be a slight exaggeration to say that Cavalli-Sforza studies everything about everybody, because actually he is “only” interested in what genes, languages, archaeology, and culture can teach us about the history and migrations of everybody for the last several hundred thousand years. As for how he came to pursue that modest goal, all of us develop as children a sense of homeland, a landscape that is familiar to us and with which we identify. For instance, after thirty-four years of living as an adult in Los Angeles, I still feel like an alien in Southern California’s chaparral and deserts: the New England forests where I grew up feel more like home. But Cavalli-Sforza’s family moved every few months during his early childhood in Italy, and he spent his student years buffeted around Europe during World War II. Today he commutes as a peripatetic academic between Stanford and Italy, but he feels especially connected to Africa’s pygmies, among whom he worked for twenty years. Those diasporas of his own may lie behind his passion to understand the homelands and migrations of all the world’s peoples.
Genes, Peoples, and Languages is thus doubly interesting: as a window into the history of all of us, and as a window into the mind of a remarkable scientist. Cavalli-Sforza’s most striking intellectual gift is his ability to extract simple but profound conclusions from messy and seemingly trivial data. Besides his mining of Italian phone books and Sardinian electrical bills, he has analyzed questionnaires asking pygmies of different ages and sexes how far from home they had ever traveled, questionnaires asking Stanford undergraduates and their parents whether they preferred butter or margarine, and records of all consanguineous marriages for which papal dispensation was granted in 280 Italian dioceses between 1910 and 1964.
Even someone capable of recognizing that something worthwhile might lie hidden in such lists still faces the difficulty of finding a mathematical model to which to fit the data, in order to extract conclusions of interest. Miraculously, Cavalli-Sforza has managed to accomplish this again and again. His strategy is to recognize that there are only a few basic types of useful mathematical models, that one can thus reuse the same model in different fields with just small changes, and hence that the key step is to recognize good analogies. For instance, related mathematical models explain compound interest on our bank accounts, antibiotics killing bacteria, the migrations of pygmies—and the great diasporas of human history.
Curiosity about ourselves is only one reason driving us to understand history. Other reasons include history’s relevance to social issues, such as racism, technological innovation, cultural change, and genetic interventions. Professional historians mine written documents, but writing arose in the Fertile Crescent only around 3400 BC, and elsewhere in the world only later. (The Fertile Crescent is flanked by the Mediterranean on the west and the Tigris and Euphrates on the east.) Hence some other methods are needed to reconstruct that 99.9 percent of our history extending from our origins around five million years ago to the origins of writing. Our main sources of information about that preliterate past are, obviously, archaeology, plus (perhaps surprisingly) the languages and genes of living peo-ples, which reveal fossilized traces of our history to those knowing how to read them. As Cavalli-Sforza explains in the preface to Genes, Peoples, and Languages, there are gaps in what each of those three disciplines—archaeology, linguistics, and genetics—can tell us about history, but combination of their data can fill the gaps and converge on a single history.
Initially, one might suppose that archaeology would be our main source of information about the preliterate past. At their most successful, archaeological excavations do yield the bones of past peoples themselves, as well as their tools, pots, and other cultural paraphernalia. In practice, the archaeological record is very spotty, and human bones are often lacking or undiagnostic. For instance, stone spear-points of the so-called Clovis culture, made by perhaps the first humans to reach the Americas around 13,000 years ago, are known in abundance from all of the lower forty-eight US states, as well as from Central America south to Guatemala. But who were the Clovis people themselves? We don’t know, because almost none of their bones have come down to us. We usually assume that they were ancestral Native Americans who migrated over the Bering Strait from Asia, and so it came as a shock to us when the now famous, recently discovered Kennewick skeleton, the oldest complete skeleton known from the Americas, was reported to look more European than Asian.
But conclusions based on a single skeleton could easily be overturned by discovery of the next skeleton and merely emphasize how fragile conclusions based on archaeology alone can be. Even when archaeological excavations yield tens of thousands of skeletons and millions of tools, as in the case of ancient Europe, that may not even begin to address some central questions of history. When and whence did Indo-European languages, the modern world’s dominant language family and the one in which this review is written, reach Europe? Linguists are still debating the answer, because bones alone give no clue to the language that their owner spoke in life.
That is why historians need the help of scientists like Cavalli-Sforza. Languages and genes of living people contain historical information in far greater abundance than is contained in the relatively few fossils that have come down to us. Obviously, if we could be transported by a time machine back to a group of people living ten thousand years ago, we could listen to their conversations, sample their blood, and identify their languages and genes directly. The recent development of techniques for extracting DNA from mummies and bones is even giving us information about genes of long-dead people. But how does Cavalli-Sforza extract information about past migrations from the languages and genes of living peoples?
Here are two examples illustrating how modern languages let one reconstruct history. The first, concerning the origins and early migrations of the English language, is useful to begin with, because we already know the answer from historical documents. But could we have deduced the answer just from languages spoken by living people? By far the greatest number of native English-speakers today live in North America, with others in Britain, Australia, and elsewhere. Hence an extraterrestrial visitor to Earth not trained in linguistics might be misled into supposing that the English language arose in North America. But a trained linguist would note that English is only one of 144 languages of the Indo-European language family, otherwise mostly confined to Europe and western Asia.
Furthermore, within that family, English is closest to Northern Europe’s Germanic languages, within which group it is most similar to the West Germanic languages centered on Germany (not to the North Germanic languages of Scandinavia), within which subgroup it is most similar to the Low Germanic languages of North Germany (not the High Germanic languages of South Germany), within which sub-subgroup it is closest to the Frisian language still spoken along the North Sea Coast from Holland to South Denmark. Hence a linguist studying only living peoples, and deprived of all historical documents, would still deduce correctly that the English language arose in Europe along the North Sea and spread from there around the world. This of course is what written histories describe in detail, beginning with the Venerable Bede’s account of Angles, Jutes, and Saxons invading England in the fifth century AD.