Beyond Words


Steven Mithen, professor of early prehistory at Reading University in England, begins his preface by writing that “the propensity to make music is the most mysterious, wonderful and neglected feature of humankind.” Since, until the advent of recordings, most music-making and all singing left no tangible trace behind, that neglect is not really surprising. How can anyone know, or plausibly surmise, what our human and pre-human ancestors did in the way of music-making?

Undaunted, Mithen explains:

I began writing this book to describe my theory as to why we should be so compelled to make and listen to music. I wished to draw together and explain the connections between the evidence that has recently emerged from a variety of disciplines including archaeology, anthropology, psychology, neuroscience and, of course, musicology. It was only after I had begun that I came to appreciate that it was not only music I was addressing but also language: it is impossible to explain one without the other…. And so the result is necessarily an ambitious work, but one that I hope will be of interest to academics in a wide range of fields and accessible to general readers—indeed, to anyone who has an interest in the human condition, of which music is an indelible part.

His book fulfills that high ambition, drawing on impressive research (his endnotes total eighty-one pages) in both recent archaeology and neuroscience. He presents a series of plausible inferences about the making of music among our remote ancestors, as well as among evolutionary dead ends like the Neanderthals of his oddly (I’d even say ill) chosen title.

In his opening chapter Mithen addresses resemblances and differences between language and music. Both, he points out,

are universal features of human society. They can be manifest vocally, physically and in writing; they are hierarchical, combinatorial systems which involve expressive phrasing and are reliant on rules…. Both communication systems involve gesture and body movement….

Yet the differences are profound. Spoken language transmits information because it is constituted by symbols, which are given their full meaning by grammatical rules; notwithstanding formulaic phrases, linguistic utterances are compositional. On the other hand, musical phrases, gestures and body language are holistic: their “meaning” derives from the whole phrase as a single entity. Spoken language is both referential and manipulative…. Music, on the other hand, is principally manipulative because it induces emotional states and physical movement by entrainment [i.e., by carrying the listener along].

Mithen goes on to summarize individual cases of brain damage or congenital abnormality that resulted in partial or complete loss of linguistic or musical capacities. Clinical testing of both disabled and normal persons showed that music and language arouse activity in different parts of the brain with some overlapping. The details of brain function he describes are complex and sometimes confusing, but Mithen concludes:

The case histories described in the two previous chapters indicate that the neural networks that process language and music have some degree of independence from each other; it is possible to “lose” or never to develop one of them while remaining quite normal in respect to the other…. This requires us to consider language and music as separate cognitive domains. Moreover, it is apparent that both language and music are constituted by a series of mental modules. These also have a degree of independence from one another, so that one can acquire or be born with a deficit in one specific area of music or language processing but not in others. The separation of the modular music and language systems, however, is not complete, as several modules, such as prosody, appear to be shared between the two systems.

Such studies show nothing about the origins and development of music, which is what Mithen seeks to discover. However, studies of mothers’ baby talk (or, in the jargon of neuroscience, infant-directed speech—IDS) convinced him that infants are born musicians, and that acquisition of language is both subsequent and different:

The idea that IDS is not primarily about language is supported by the universality of its musical elements. Whatever country we come from and whatever language we speak, we alter our speech patterns in essentially the same way when talking to infants.

Experiments involving six different languages showed that “infants responded in the appropriate manner to the type of phrase they were hearing, frowning at the phrases expressing prohibition and smiling at those expressing approval, whatever language was being spoken and even when nonsense syllables were used.” Even more surprising, other experiments showed “that when we enter the world, we have perfect pitch but that this ability is replaced by a bias towards relative pitch as we grow older.” Otherwise, Mithen explains, we would “be unable to recognize that the same word spoken by a man and a woman with differently pitched voices is indeed the same word.”

He sums up:

…It is evident that those who use facial expressions, gestures and utterances to stimulate and communicate with their babies are effectively moulding the infants’ brains into the appropriate shape to become effective members of human communities, whether we think of those as families or societies at large. Parents largely do this on an intuitive basis—they do not need to be taught IDS—and use music-like utterances and gestures to develop the emotional capacities of the infant prior to facilitating language acquisition.

The reader may ask whether a mother’s baby talk qualifies as music. Mithen argues persuasively that, in a broad sense, such musical elements as pitch, tone, and rhythm of a mother’s voice are central in communication with babies.

The final chapter of the first part of Mithen’s book deals with music, emotion, healing, and intelligence. Mithen describes recent findings as follows:

Music can be used to express our emotions, and to manipulate the emotions and behaviour of others. In modern Western society, and probably in those of all modern humans, music is rarely used in this manner other than for entertainment, because we have a far more powerful means of telling someone what we are feeling: language. But there would have been a time when our ancestors lacked language even though they had a complex range of emotions and the need at times to influence the behaviour of other individuals. That is, in fact, the situation for our close relatives who are alive today, the great apes. And it is to these we must now turn as we begin to examine how music and language evolved.

According to Mithen, grunts, barks, and gestures do indeed make up a communication system among the great apes, probably much resembling the repertoire our pre-human ancestors elaborated into a system of communication without words that he calls “Hmmmm.” This is an acronym for communication that is “Holistic, multi-modal, manipulative, and musical.” This, I judge, is the core of the musical “theory” Mithen set out to explain to his readers, since the notion that “Hmmmm” was the original matrix from which Homo sapiens eventually developed fully grammatical speech and modern forms of music is the theme of the rest of his book.

I must admit that I have some difficulty understanding what holistic communication is really like. Mithen repeatedly declares that holistic utterances “were complete messages rather than words to be combined.” They were, he believes, “greetings, statements and requests.” This implies that holistic utterances have a definite beginning and end; and that their meaning came in matching chunks. Was there silence in between? Or could utterances be prolonged, but somehow segmented as happens in a modern symphony or in the flow of verbal speech? What about accompanying gestures, tones, and volume? Did they all start simultaneously when the utterance began and break off as it ended, or did they have different temporal manifestations? And, of course, how can anyone know? Mithen does not raise these questions, papering them over by using the word “holistic.”

The other elements in his definition of “Hmmmm” are clear enough. “Multi-modal” means that gestures of the entire body, but especially of the face and eyes, were part of the message. “Manipulative” means that the message affected the feelings and behavior of those sending and receiving it. And “musical” means that senders’ voicing varied in pitch and volume and (later on) was rhythmic as well.

Despite my uncertainties about holistic utterances, Mithen has many fascinating suggestions about how the circumstances of early hominid life on the African savanna may have provoked changes in anatomy and improved the range and precision of communication. A key hypothesis is that early hominids

are likely to have chosen to live in much larger groups than their forest-dwelling relatives…. Away from the cover of trees, safety can only be found in numbers, which provide more eyes to notice predators and lessen the chances that any particular individual will be attacked. There is, however, a cost: social tensions leading to conflict can easily arise when large numbers have to live continually in close proximity to one another.

Changed diet—more meat perhaps—together with a widened range of loud and quiet uses of the voice, and anatomical alterations of tongue and mouth, permitted early hominid bands to reduce internal conflicts by elaborating “Hmmmm”—“a communication system more complex than that found now among non-human primates, but one quite different from human language.”

In addition, walking upright on two legs gave new scope to rhythmic bodily movement—walking, running, and dancing—as well as to music-making and communication. But for such early hominids, taking “the role of rhythm and movement in ‘Hmmmm’ to a new order of magnitude” was

only the beginning of an evolutionary process. The holistic phrases used by Homo ergaster [an early African primate]—generic forms of greetings, statements and requests—are likely to have been small in number, and the potential expressiveness of the human body may not have been realized until later, bigger-brained, species of Homo had evolved.

Those bigger-brained humans started to spread from Africa into Eurasia and also began to make spears and hand-axes for hunting and butchering big game. Mithen surmises that they also elaborated “Hmmmm” into “Hmmmmm” by adding “mimesis” of animal behavior and of some important objects and natural features of the landscape to their repertoire. As a result,

Hmmmmm” might have included utterances that meant “hunt deer with me” or “hunt horse with me.”… Other phrases might have included, for example, “meet me at the lake,” “bring spears,” “make hand-axes,” or “share food with…” followed by a pointing gesture towards the individual or mimesis of that individual.

In this hypothetical communication system, the length and number of utterances were limited by the short- and long-term memory of early humans, which Mithen assumes were just like ours. Establishing meanings for new utterances was also difficult. “The ‘Hmmmmm’ communication system would, therefore, have been dominated by utterances descriptive of frequent and quite general events,” such as hunting and the other activities I have just cited. Such a limited repertoire, in turn, sustained “conservatism in thought and behaviour in a manner that a language constituted by words and grammatical rules would not.” Consequently, early humans showed “a marked lack of innovation throughout their existence between about 1.8 and 0.25 million years ago.”

