Genes, Peoples, and Languages
Luigi Luca Cavalli-Sforza’s latest book summarizes the life work of this fascinating polymath, who for the last fifty-five years has been developing ingenious methods to understand the history of everybody. I first encountered his methods by chance thirteen years ago while I was browsing my weekly copy of the international scientific journal Nature. Virtually all of the journal’s articles were written in technical language incomprehensible to laypeople, and indeed to most scientists. There were studies of the high-Tc superconductor YBa2Cu3O7_s, a c-erb-A binding site, copia element genome reshuffling, corticofugal feedback, and other things that I had never heard of in my career as a scientist.
Halfway through the journal, sandwiched between articles on bimolecular chirality and on African finch bill size polymorphism, I came across something different. Five authors with Italian names, which I took to be pseudonyms, claimed to have extracted the surnames of 10,473,727 Italian telephone subscribers from the phone directories of ninety-one Italian provinces. The authors then allegedly analyzed the surnames by the usual methods applied to bona fide scientific data, such as fitting the names to equations, drawing graphs, and applying statistical tests. The paper was evidently one of those hilarious spoofs of science that Nature editors occasionally run. Cited in the list of references at the end of the paper were other articles by some of the same authors, apparently constituting other spoofs. Into respectable journals like The Annals of Human Genetics those pseudonymous Italians had managed to slip purported analyses of surnames extracted from Sardinian residential electric bills.
As I reread the Nature paper, I gradually realized that it was no joke but instead a brilliant, serious study. Written histories describe migrations, but rarely can say exactly how many people moved, where they originated, and where they ended up. The authors of the apparent spoof had figured out how to extract answers to those questions from local lists of surnames. As a glance at any telephone directory will show, there are a few common surnames (like Smith in the US) and thousands of rarer ones, but a name’s frequency differs between localities. Hence if migration is occurring between two localities with different name frequencies, the relative frequencies at the two localities differ in a way that lets one calculate the migration rates by means of a mathematical analysis. For instance, the relative frequencies of the names Garcia and Smith differ between the Mexico City and Los Angeles phone books, thereby reflecting Mexican immigrants to the US and American immigrants to Mexico (many of them named Garcia and Smith, respectively).
By analyzing the names in Italian phone directories, the authors of the Nature paper extracted patterns of migration during thousands of years of Italian history. Dante Alighieri had already written around the year AD 1305, “As to the ancient vulgar tongue …I maintain that Italy is divided into two parts, the right and the left. And if anyone should ask what is the dividing line, I answer that …