Steven Mithen, professor of early prehistory at Reading University in England, begins his preface by writing that “the propensity to make music is the most mysterious, wonderful and neglected feature of humankind.” Since, until the advent of recordings, most music-making and all singing left no tangible trace behind, that neglect is not really surprising. How can anyone know, or plausibly surmise, what our human and pre-human ancestors did in the way of music-making?

Undaunted, Mithen explains:

I began writing this book to describe my theory as to why we should be so compelled to make and listen to music. I wished to draw together and explain the connections between the evidence that has recently emerged from a variety of disciplines including archaeology, anthropology, psychology, neuroscience and, of course, musicology. It was only after I had begun that I came to appreciate that it was not only music I was addressing but also language: it is impossible to explain one without the other…. And so the result is necessarily an ambitious work, but one that I hope will be of interest to academics in a wide range of fields and accessible to general readers—indeed, to anyone who has an interest in the human condition, of which music is an indelible part.

His book fulfills that high ambition, drawing on impressive research (his endnotes total eighty-one pages) in both recent archaeology and neuroscience. He presents a series of plausible inferences about the making of music among our remote ancestors, as well as among evolutionary dead ends like the Neanderthals of his oddly (I’d even say ill) chosen title.

In his opening chapter Mithen addresses resemblances and differences between language and music. Both, he points out,

are universal features of human society. They can be manifest vocally, physically and in writing; they are hierarchical, combinatorial systems which involve expressive phrasing and are reliant on rules…. Both communication systems involve gesture and body movement….

Yet the differences are profound. Spoken language transmits information because it is constituted by symbols, which are given their full meaning by grammatical rules; notwithstanding formulaic phrases, linguistic utterances are compositional. On the other hand, musical phrases, gestures and body language are holistic: their “meaning” derives from the whole phrase as a single entity. Spoken language is both referential and manipulative…. Music, on the other hand, is principally manipulative because it induces emotional states and physical movement by entrainment [i.e., by carrying the listener along].

Mithen goes on to summarize individual cases of brain damage or congenital abnormality that resulted in partial or complete loss of linguistic or musical capacities. Clinical testing of both disabled and normal persons showed that music and language arouse activity in different parts of the brain with some overlapping. The details of brain function he describes are complex and sometimes confusing, but Mithen concludes:

The case histories described in the two previous chapters indicate that the neural networks that process language and music have some degree of independence from each other; it is possible to “lose” or never to develop one of them while remaining quite normal in respect to the other…. This requires us to consider language and music as separate cognitive domains. Moreover, it is apparent that both language and music are constituted by a series of mental modules. These also have a degree of independence from one another, so that one can acquire or be born with a deficit in one specific area of music or language processing but not in others. The separation of the modular music and language systems, however, is not complete, as several modules, such as prosody, appear to be shared between the two systems.

Such studies show nothing about the origins and development of music, which is what Mithen seeks to discover. However, studies of mothers’ baby talk (or, in the jargon of neuroscience, infant-directed speech—IDS) convinced him that infants are born musicians, and that acquisition of language is both subsequent and different:

The idea that IDS is not primarily about language is supported by the universality of its musical elements. Whatever country we come from and whatever language we speak, we alter our speech patterns in essentially the same way when talking to infants.

Experiments involving six different languages showed that “infants responded in the appropriate manner to the type of phrase they were hearing, frowning at the phrases expressing prohibition and smiling at those expressing approval, whatever language was being spoken and even when nonsense syllables were used.” Even more surprising, other experiments showed “that when we enter the world, we have perfect pitch but that this ability is replaced by a bias towards relative pitch as we grow older.” Otherwise, Mithen explains, we would “be unable to recognize that the same word spoken by a man and a woman with differently pitched voices is indeed the same word.”


He sums up:

…It is evident that those who use facial expressions, gestures and utterances to stimulate and communicate with their babies are effectively moulding the infants’ brains into the appropriate shape to become effective members of human communities, whether we think of those as families or societies at large. Parents largely do this on an intuitive basis—they do not need to be taught IDS—and use music-like utterances and gestures to develop the emotional capacities of the infant prior to facilitating language acquisition.

The reader may ask whether a mother’s baby talk qualifies as music. Mithen argues persuasively that, in a broad sense, such musical elements as pitch, tone, and rhythm of a mother’s voice are central in communication with babies.

The final chapter of the first part of Mithen’s book deals with music, emotion, healing, and intelligence. Mithen describes recent findings as follows:

Music can be used to express our emotions, and to manipulate the emotions and behaviour of others. In modern Western society, and probably in those of all modern humans, music is rarely used in this manner other than for entertainment, because we have a far more powerful means of telling someone what we are feeling: language. But there would have been a time when our ancestors lacked language even though they had a complex range of emotions and the need at times to influence the behaviour of other individuals. That is, in fact, the situation for our close relatives who are alive today, the great apes. And it is to these we must now turn as we begin to examine how music and language evolved.

According to Mithen, grunts, barks, and gestures do indeed make up a communication system among the great apes, probably much resembling the repertoire our pre-human ancestors elaborated into a system of communication without words that he calls “Hmmmm.” This is an acronym for communication that is “Holistic, multi-modal, manipulative, and musical.” This, I judge, is the core of the musical “theory” Mithen set out to explain to his readers, since the notion that “Hmmmm” was the original matrix from which Homo sapiens eventually developed fully grammatical speech and modern forms of music is the theme of the rest of his book.

I must admit that I have some difficulty understanding what holistic communication is really like. Mithen repeatedly declares that holistic utterances “were complete messages rather than words to be combined.” They were, he believes, “greetings, statements and requests.” This implies that holistic utterances have a definite beginning and end; and that their meaning came in matching chunks. Was there silence in between? Or could utterances be prolonged, but somehow segmented as happens in a modern symphony or in the flow of verbal speech? What about accompanying gestures, tones, and volume? Did they all start simultaneously when the utterance began and break off as it ended, or did they have different temporal manifestations? And, of course, how can anyone know? Mithen does not raise these questions, papering them over by using the word “holistic.”

The other elements in his definition of “Hmmmm” are clear enough. “Multi-modal” means that gestures of the entire body, but especially of the face and eyes, were part of the message. “Manipulative” means that the message affected the feelings and behavior of those sending and receiving it. And “musical” means that senders’ voicing varied in pitch and volume and (later on) was rhythmic as well.

Despite my uncertainties about holistic utterances, Mithen has many fascinating suggestions about how the circumstances of early hominid life on the African savanna may have provoked changes in anatomy and improved the range and precision of communication. A key hypothesis is that early hominids

are likely to have chosen to live in much larger groups than their forest-dwelling relatives…. Away from the cover of trees, safety can only be found in numbers, which provide more eyes to notice predators and lessen the chances that any particular individual will be attacked. There is, however, a cost: social tensions leading to conflict can easily arise when large numbers have to live continually in close proximity to one another.

Changed diet—more meat perhaps—together with a widened range of loud and quiet uses of the voice, and anatomical alterations of tongue and mouth, permitted early hominid bands to reduce internal conflicts by elaborating “Hmmmm”—“a communication system more complex than that found now among non-human primates, but one quite different from human language.”

In addition, walking upright on two legs gave new scope to rhythmic bodily movement—walking, running, and dancing—as well as to music-making and communication. But for such early hominids, taking “the role of rhythm and movement in ‘Hmmmm’ to a new order of magnitude” was

only the beginning of an evolutionary process. The holistic phrases used by Homo ergaster [an early African primate]—generic forms of greetings, statements and requests—are likely to have been small in number, and the potential expressiveness of the human body may not have been realized until later, bigger-brained, species of Homo had evolved.

Those bigger-brained humans started to spread from Africa into Eurasia and also began to make spears and hand-axes for hunting and butchering big game. Mithen surmises that they also elaborated “Hmmmm” into “Hmmmmm” by adding “mimesis” of animal behavior and of some important objects and natural features of the landscape to their repertoire. As a result,


“Hmmmmm” might have included utterances that meant “hunt deer with me” or “hunt horse with me.”… Other phrases might have included, for example, “meet me at the lake,” “bring spears,” “make hand-axes,” or “share food with…” followed by a pointing gesture towards the individual or mimesis of that individual.

In this hypothetical communication system, the length and number of utterances were limited by the short- and long-term memory of early humans, which Mithen assumes were just like ours. Establishing meanings for new utterances was also difficult. “The ‘Hmmmmm’ communication system would, therefore, have been dominated by utterances descriptive of frequent and quite general events,” such as hunting and the other activities I have just cited. Such a limited repertoire, in turn, sustained “conservatism in thought and behaviour in a manner that a language constituted by words and grammatical rules would not.” Consequently, early humans showed “a marked lack of innovation throughout their existence between about 1.8 and 0.25 million years ago.”

Mithen then takes up “Singing for Sex,” speculating that decreasing differences between male and female body size had the effect of putting

greater emphasis on female choice…. As a consequence, males would have needed to invest more time and energy in their displays, so as to attract female attention and interest; they could no longer rely on brute force against one another and against the females in order to achieve their reproductive ends.

Probably singing and dancing served that purpose; but like “Hmmmmm” itself, neither left a clear trace in the archaeological record.

Mithen, however, argues that the manufacture of symmetrical and aesthetically attractive hand-axes was a means by which early human males

indicated the kinds of mental and physical capacities that any mother would wish to be inherited by her offspring. This is the essence of what has become known as the sexy hand-axe hypothesis, proposed in 1999 by myself and the evolutionary biologist Marek Kohn.

At first blush, the notion that hand-axes were made to impress future mates and mothers seems preposterous; but Mithen claims that it explains why “we should find so many hand-axes in the archaeological record, often several hundreds discarded together in pristine condition. Once made, they were of limited further use.”

Maybe so, but I am not convinced by Mithen’s assertion that “plain stone flakes, or those minimally shaped by chipping” were just as good as elegant and symmetrical hand-axes for cutting animal carcasses, severing plant stems, and shaping wood. It seems probable to me that the balance and symmetry of ancient hand-axes improved the heft and fitted the hands wielding the axes better than crudely shaped stones could do. If so, their exquisite workmanship was indeed of practical value as well as aesthetically pleasing, and to attribute their manufacture to a kind of sexual display seems far-fetched indeed.

Next Mithen discusses how enhanced communication between mothers and infants solved problems arising when

bipedalism resulted in a relatively narrow pelvis and hence birth canal, which limits the size of infants at birth, especially their brain size. To compensate, infants were effectively born premature and continued rapid foetal growth rates for the first year of life outside of the womb.

He suggests that slings to carry burdensome babies may have been the first form of clothing, and that putting down a baby to free both hands for gathering food and other tasks provoked prehistoric baby talk, using rhythm and musical intonation to impart and sustain emotional ties between mother and infant even when they were not within touching range of one another. And, to induce sleep, he believes that early human mothers resorted to lullabies quite like those still in use among us.

The sexual and nurturing roles for “Hmmmmm,” however, were, for Mithen, secondary to the central value music-making had for our ancestors, i.e., to strengthen cooperation and minimize frictions within the enlarged groups of hunters and foragers who lived on the African savanna. Mithen summarizes a wide range of recent arguments about human cooperation and competition (including, I am happy to say, my own discussion of “boundary loss,” provoked when people participate in dances and drills) before concluding:

Hominids would have frequently and meticulously examined the likely intentions, beliefs, desires and feelings of other members of the group before deciding whether to cooperate with them. But on other occasions simply trusting them would have been more effective, especially if quick decisions were necessary. As a consequence, those individuals who suppressed their own self-identity and instead forged a group identity by shared “Hmmmmm” vocalizations and movements, with high emotional and hence musical content would have prospered…. As Early Humans colonized northern latitudes, developed big-game hunting, and coped with the dramatic environmental changes of the Pleistocene, the need for cooperation became ever greater. Hence communal “Hmmmmm” music-making would have become pervasive in Early Human society.


With that observation, Mithen arrives at the singing Neaderthals of his title. Here is how he introduces them:

Most anthropologists are tempted to equate the large brain of Homo neanderthalensis with a capacity for language…. But the temptation must be resisted; the Neanderthals who inhabited Europe and south-west Asia had brains as large as those of modern humans but behaved in quite different fashion, one that indicates the absence of language…. So, what were the Neanderthals doing with such large brains?

This book has been providing the answer: the Neanderthals used their brains for a sophisticated communication system that was Holistic, manipulative, multi-modal, musical, and mimetic in character: “Hmmmmm.” While this was also the case for their immediate ancestors and relatives,…the Neanderthals took this communication system to an extreme…. They utilized an advanced form of “Hmmmmm” that proved remarkably successful: it allowed them to survive for a quarter of a million years through dramatic environmental changes in ice-age Europe, and to attain an unprecedented level of cultural achievement. They were “singing Neanderthals”—although their songs lacked any words—and were also intensely emotional beings: happy Neanderthals, sad Neanderthals, angry Neanderthals, disgusted Neanderthals, envious Neanderthals, guilty Neanderthals, grief-stricken Neanderthals, and Neanderthals in love. Such emotions were present because their lifestyle required intelligent decision-making and extensive social cooperation.

Among the principal reasons for Mithen’s belief that Neanderthals lacked language are the absence of symbolic artifacts among their archaeological remains, and, more particularly,

the immense stability of their culture. The tools they made and the way of life they adopted at around 250,000 years ago were effectively no different from those current at the moment of their extinction, just after 30,000 years ago. As we know from our own personal experience and from a moment’s reflection on human history, language is a force for change…. So, if the Neanderthals had possessed language, how could their culture have remained so stable and so limited in scope? Well, it simply could not have done so.

This strikes me as convincing. But the proposition that “Neanderthals maintained the capacity for perfect pitch with which we must assume they were born” and surpassed both their predecessors and modern humans in musical ability seems fanciful. Who can tell? The same question applies to Mithen’s argument that the capacity to express many different emotions is needed to make decisions that are conducive to survival. He blithely declares:

Therefore, since making the right decision was of such consequence for the Neanderthals, they must have been highly emotional people, and that would have found expression in their utterances, their postures and their gestures.

Mithen’s further suggestion that the difficulty of tending helpless infants under ice-age conditions may have reshaped gender relations is more persuasive:

With the additional challenges that ice-age life presented, the females now required males to provide resources as well as their genes; they needed males to secure and share food for themselves and their joint offspring, and to provide shelter, clothing, fire and other necessities of ice-age life. They needed reliable males whom they could trust.

In short, according to Mithen, families more or less like ours arose among Neanderthals, and instead of singing and dancing to attract females promiscuously, as would earlier have been the case, “there would have been singing and dancing as a means of advertising and consolidating pair-bonding.” The well-known fact that Neanderthals buried their dead also suggests that family feeling for elders and infants came into play more strongly than before.

Mithen concludes his chapter on Neanderthals with a note of caution:

Trying to understand the “Hmmmmm”ing world of a Neanderthal is challenging, owing to limitations of our imaginations, the inevitable speculation involved, and the restricted evidence on which these speculations must be based. Also, I believe that all modern humans are relatively limited in their musical abilities when compared with the Neanderthals. This is partly because the Neanderthals evolved neural networks for the musical features of “Hmmmmm” that did not evolve in the Homo sapiens lineage, and partly because the evolution of language has inhibited the musical abilities inherited from the common ancestor that we share with Homo neanderthalensis.

Mithen endorses the idea that capacity for language among Homo sapiens was the result of a specific genetic mutation that occurred “during the last 200,000 years of human history, that is, concomitant with or subsequent to the emergence of anatomically modern humans.” Moreover, artifacts with symbolic meanings show up some 70,000 years ago in Africa and “there is compelling circumstantial evidence that by 100,000 years ago, if not before, Homo sapiens were making and using symbols.” Words, Mithen argues, arose from “Hmmmmm” by a process of “segmentation” and as separate words entered current usage, grammar soon followed, allowing the emergence of “compositionality,” by which separate words can be “recombined” to “create an infinite array of new utterances.” This is “the feature that makes language so much more powerful than any other communication system.” His account of these processes relies on recent articles and computer simulations by others, and, as he admits, they are controversial and a good many linguists reject them.

It seems to me that very general words like “segmentation” and “compositionality” do not tell us much about what actually happened. Indeed I come away from Mithen’s account with the feeling that no one knows and one may never know how language arose among humans. Perhaps genetic changes rearranged patterns of connection among brain neurons, perhaps social changes provoked more frequent contact and communication with strangers; and presumably individuals invented and then propagated new words and meanings, just as they still do today. Very likely, all of the above were at work, together with other still-unimagined factors; and none of them will ever be supported by enough evidence to prove them true.

Mithen’s observation about how long it took for language to catch on is more convincing:

Amid a continuation of tool-making traditions that stretch back at least two hundred and fifty thousand years, there are sporadic traces of new behaviour in Africa of the type that archaeologists associate with modern, language-using humans. The transition from a predominantly “Hmmmmm” communication system to a compositional language most likely took tens of thousands of years…. It was not until after 50,000 years ago that many of the new behaviours became permanent features of the human repertoire.

His final chapter then takes up how “music emerged from the remnants of ‘Hmmmmm’ after language evolved.” One of the key changes was that “language-using modern humans were able to invent complex instruments, …providing a host of new possibilities for musical sound.” And, “with the emergence of religious belief, music became the principal means of communicating with the gods.” Continuity with his hypothetical “Hmmmmm” was very strong. Music remained a means of communicating emotion and “still provides some of the adaptive value that was central to ‘Hmmmmm,’ especially in forging group identities; but we also enjoy making music and pursue it at will.” He ends his book as follows:

In spite of all this, words remain quite inadequate to describe the nature of music, and can never diminish its mysterious hold upon our minds and bodies. Hence my final words take the form of a request: listen to music. When doing so, think about your own evolutionary past…. That evolutionary inheritance is why you like music—whatever your particular taste…. Once you have listened, make your own music and liberate all these hominids that still reside within you.


What are we to make of Mithen’s provocative venture beyond the reach of words and firm evidence?

First of all, I entirely agree that speculating about human and proto-human patterns of communication is intellectually legitimate and very much worth undertaking. Our existing level of verbal and musical communication is central to the human condition, distinguishing us from other forms of life, and it is what allows our species to dominate the earth as no single species ever did before. Such abilities must have mattered throughout hominid and human history, even though they leave no direct trace in archaeological remains. By bringing music to the fore, Mithen remedies earlier neglect and offers his readers the most perspicacious portrait of the role of communication among our remote predecessors that I have ever encountered.

That is a great accomplishment, and my doubts about some of Mithen’s wilder flights of fancy do not diminish my admiration for his bold effort to explain how we got to be what we are. In particular, “Hmmmm” and “Hmmmmm” are ingenious acronyms, likely to enter future debates about how language and music intertwine. Mithen’s book, in short, seems destined to become a landmark in the way experts and amateurs alike seek to understand the character and evolutionary importance of hominid and early human communication.

Yet to my mind his argument remains lopsided. The role of musical communication was what he set out to investigate, and when he recognized that rhythmic movement of legs and arms and the emergence of language also belonged in his account, he brought them in, but paid them less attention than I believe they deserve.

Thus, in discussing my book Keeping Together in Time,* he recognizes that it is “primarily about rhythmic body movement. This falls, however, under the broad definition of music I have adopted, and McNeill’s arguments are as applicable to singing together as they are to dancing together.” That is quite true; and my book is far more lopsided than Mithen’s, since I failed to give due attention to singing and the instrumental music that started, perhaps, with beating a wooden stick on the ground, and then on a more resonant surface—the first drum. But I still believe that human response to dance and other rhythmic movements of the larger muscles is not simply part of music. It has more distinct—or at least more forceful—physiological and emotional effects than the small muscles of voice box, tongue, mouth, and ears ordinarily arouse, and therefore deserves separate and parallel consideration.

Similarly, with respect to language, it seems to me that Mithen’s preoccupation with music led him to skip over essential characteristics of that new form of communication. In particular, he says nothing about how verb tenses created consciousness of past and future, supplementing present sensory inputs and opening the path for planning ahead and suffering disappointment whenever experience fell short of expectation. That new feature of everyday life was what made language so dynamic—inviting, even compelling, humans to think again and try something different whenever hopes and expectations fell short.

That, it seems to me, was why human society suddenly became so volatile about 50,000 years ago, as behavior came to be governed by a web of words, knitting past, present, and future together in new, unprecedented fashion. To be sure, that was not what Mithen set out to examine; but any balanced, complete account of human communication needs to look into the special power of language more incisively than he does, and (perhaps) also give more attention to how muscular bonding provoked by dance and drill forged powerful (and sometimes entirely new) collective identities among participants.

Still, The Singing Neanderthals, despite its lame title, and a few unpersuasive notions, offers a learned, imaginative overview of the most important and most elusive dimension of the real but unrecorded past: i.e., how communication among our predecessors changed their lives, sustained their communities, and promoted their survival. No one has previously undertaken that task so well.

This Issue

April 27, 2006