Steven Pinker
Steven Pinker; drawing by David Levine

Unlike Noam Chomsky’s ambitious recent work in linguistics, Steven Pinker’s Words and Rules is a popular exposition of scholarly work on language. It succeeds in its aim of conveying a great deal of information in a lively and often humorous style.1 The book has a simple thesis, often repeated:

The ingredients of language are words and rules. Words in the sense of memorized links between sound and meaning; rules in the sense of operations that assemble words into combinations whose meaning can be computed from the meanings of the words and the way they are arranged.

If we assume that “computed” in the quoted passage just means “figured out,” then there is a way of interpreting the thesis in which it could hardly be false, and indeed would not be worth writing a book about. We cannot imagine a full-blown spoken human language that did not have both words and rules in Pinker’s rather special senses—that is, that did not have both meaningful spoken units and ways of combining them into larger units such as sentences. Clearly you need both. Words without rules are blind, rules without words are empty.

According to another interpretation the thesis is false. Pinker sometimes talks as if words and rules are sufficient to understand sentences. But that is wrong. We understand the word “cut” in the order “Cut the grass” quite differently from the way we understand “cut” in the order “Cut the cake” even though the same word, in his sense, occurs in both. Thus we understand “cut” in the sentences “The barber cut my hair,” “The tailor cut the cloth,” and “The surgeon cut the skin” quite differently because we bring to bear on these sentences a large cultural background knowledge of how things work and how they are done. For the same reason we don’t know how to interpret the sentences “Sally cut the sun” or “Bill cut the mountain” because we have no background knowledge that would determine an interpretation for these sentences.

Notice that in the sentences we do not understand, we have perfectly ordinary words in perfectly ordinary com-binations, but we still have no idea how to interpret the sentences. The point applies throughout our use of language. We understand the same word “healthy” differently in “healthy complexion,” “healthy diet,” and “healthy body” even though the word keeps the same meaning. The point is one that Pinker seems to have missed: the understanding of any sentence requires a set of capacities that go beyond words and rules in his senses of these notions. In any case, whether we interpret his claim as platitudinous or false, he uses it as an entering wedge into a number of important arguments, as we will see.

One quibble about the title Words and Rules, and the formulation of the thesis: it is misleading for Pinker to treat words and rules as if they were somehow contrasting, because words are also a matter of rules. To know the meaning of a word is, on his own account, to know the rule that associates a sound with a meaning. If I know the meaning of the word “rose,” for example, I know a rule that associates the sound roz with its meaning. And such rules are just as much rules as the rule that says you can form the past tense in English by adding -ed to the verb stem. So, contrary to its title, the book is not about words and rules, but about rules and rules, rules for using words and rules for combining words. Pinker, of course, knows all of this and knows that his contrast is actually between two kinds of rules, which we might think of as particular and general. The rule for forming the past tense in English is general. It says, “For all x if x is a regular verb, form the past tense by adding -ed to the verb stem.” The rule that associates a sound with a meaning is not in that way general. But both types of rules are rules.

If the point that in a language you require at least meaningful elements such as words and rules for combining them into sentences is so obvious, then what is the point of Pinker’s book? His thesis is less trivial when put in the form that language consists of memorized words, each an arbitrary pairing between a sound and a meaning, and a set of rules that assemble words into combinations that give us a potentially infinite number of longer expressions such as sentences. Pinker has important theoretical points to make, and the discussion of “words and rules” is a device for getting at these larger issues. The main point he is making is that there are two quite distinct cognitive operations involved in knowing and using a language: memorization and retrieval of specific cases, “words,” and application of general procedures, “rules.”


He thinks that a good test case for his theory that there are two distinct sorts of linguistic mechanisms is in English irregular verbs, of which there are nearly two hundred. For these verbs we cannot just apply the general rule but must memorize the irregular forms. Once the child knows that the past of walk is walked and the past of stop is stopped, the rule he has learned will not apply to the combinations eat-ate, go-went, ring-rang, blow-blew, is-was, and strike-struck. The rule for forming the past tense in English is wonderfully simple: add -ed. This can receive three different pronunciations, t, d, and id, as in worked, bored, and wanted. There are not three different past-tense rules but one past-tense rule with three different pronunciation (phonological) rules. The irregular verbs seem puzzling because they do not seem to be cases of general rules, but at the same time in many cases they exhibit the features of general rules. Some sets of irregular verbs, for example, come in patterns, thus deal-dealt, feel-felt, and mean-meant as well as write-wrote, drive-drove, and ride-rode and also blow-blew, grow-grew, know-knew, and throw-threw. Furthermore, some of these patterns are productive in the sense that a new verb may be treated as part of the irregular pattern and not given the regular conjugation. Thus if we invent the verb “to spling” and ask ourselves how we would describe someone of whom this verb was true in the past, would we say “he splinged”? Pinker cites studies that show that most speakers prefer “he splung” or “he splang,” on analogy with ring-rang, sing-sang, and wring-wrung.

But if both the regular and irregular past tense involve patterns that people can generalize, then what is the force of Pinker’s distinction between rules and words? His answer is that the irregular patterns involve memorized words and new forms that are similar to the memorized forms. But, he also writes, the regular past tense can apply intelligibly to any word whatever and does not require memorization of a particular word or word pattern. If a child says drived or rided for the past tense, we know what he means. This suggests that there is a psychological and neurobiological distinction between the two sorts of cases. The regular inflection is the normal or default case.

The difficulty with Pinker’s appeal to “patterns” is that it looks like an appeal to certain sorts of rules. Each “pattern” exhibits a rule. If that is the right way to think of the matter, then in addition to the difference between particular and general rules we would need rules that apply to families of resembling cases, as in the examples know-knew, blow-blew, grow-grew. I will come back to this point in a moment.

The reason it is impossible to specify the number of irregular verbs exactly is that old irregulars sometimes become regular and new irregular forms are created. Thus if you think the past tense of thrive is thrived, as I do, then the old irregular form throve has ceased for you, and the verb is now a regular verb. On the other hand, if you are comfortable with the past tense of the verb sneak as snuck, then this verb is an irregular for you, and you part company with those who prefer sneaked.

In an important argument Pinker shows that the notion “regular” does not mean statistically more common. The notion of the regular case does not imply that this case is more frequent. Even in a language where most verbs are irregular there would still be a logical distinction between the regular and irregular forms. The distinction is between those cases where the rule can apply to any word in a category and those cases where you must store in your memory a specific word or a pattern. Of the thousand most common verbs in English, 86 percent are regular, but in German only 45 percent of the thousand most common verbs are regular.

Pinker’s hypothesis about particular rules and general rules is important for him because it leads into much larger questions about human cognition. Three of these are especially interesting.

First, the different logical character of the particular and the general suggests that different cognitive abilities and indeed different parts of the brain are involved in the two sorts of abilities. Studies of brain-damaged patients suggest that this is so. One can have damage to the capacity for memorizing words, without hurting the capacity to apply rules. Pinker’s account here is the most intellectually important part of his book. Recent technological advances in brain imaging, especially functional magnetic resonance imaging (fMRI), can give us information about which structures in the brain are processing which information. The good news, Pinker tells us, is that some recent studies show that different parts of the brain are activated for words and rules, in his special sense of these notions. The bad news is that the different research teams do not agree on which parts of the brain are activated by each process. Much of the importance of this work derives from the fact that if he is right then Noam Chomsky is wrong to think of language competence as a distinct faculty in the brain.


The question is not essentially one of anatomy. It is not whether there is a single location in the brain for the language faculty or two different locations. The question is rather a functional question. Is there one set of functions performed by the brain, or are there two distinct sets of functions? According to Pinker’s theory there are two quite different faculties, and they differ both anatomically and in the principles of their operation. This issue is still very much in doubt. Some work by the linguistic scholar Charles Yang attempts to show how the child could acquire both the regular and the irregular verb conjugations using a single mechanism that assigns probability weights to hypotheses on the basis of linguistic evidence from the environment. According to Yang, many of today’s irregular verbs are historical survivors of what were once systematic rules. There was a rule that produced a past -ew whenever -ow occurred in the present, as in know-knew, blow-blew, and grow-grew. By neglecting this historical evidence Pinker mistakenly supposes that the irregular cases have to be memorized on a case-by-case basis, whereas according to Yang what has to be memorized is which rule applies. Yang strengthens his argument by bringing in evidence from other languages. For example, Yang accepts Pinker’s argument that the “default” or “regular” way to form a plural noun in German is add -s, as in Kinos (cinema) and Autos. This way of forming the plural is statistically rare, but it is regular in the sense that if you don’t know anything else about the word, if it is a new word for example, you can form the plural by adding –s. That explains why many of the examples are of foreign origin. But there are several irregular ways to form the plural, as in Kind-Kinder (child-children), Strasse- Strassen (street-streets), and Hund-Hunde (dog-dogs). Yang argues that these irregular patterns are rule-based and that the child’s task is not to memorize plurals on a word-by-word basis, but to figure out which rule applies, to which set the noun belongs. If Yang is right, and I think he is, then Pinker’s irregulars are not illustrations of the words-and-rules thesis, but the less-general-rules-and-more-general-rules thesis.2

Second, Pinker thinks his work bears on the traditional philosophical dispute between rationalists and empiricists. Indeed, in a breathtaking passage he says, “The past tense is the only case I know in which two great systems of Western thought [rationalism and empiricism] may be tested and compared on a single rich set of data, just like ordinary scientific hypotheses.” In company with some other linguists and psychologists, Pinker has a distorted conception of these philosophical movements. He thinks rationalism and empiricism are in some essential way about the nature of thought processes. On his view, the “empiricists” are defined by their belief that mental processes are matters of the association of ideas with other ideas. The “rationalists,” according to him, “were obsessed by combinatorial grammar,” that is, the sort of grammar that shows how meanings of larger units can be determined by the meanings of smaller units and the rules of their combination—rules for example about conjoining phrases by the use of “and” and opposing them by the use of “but.” He thinks the rationalists were obsessed by combinatorial grammar because they believed that “intelligence arises from the manipulation of symbols” according to rules.

Historically this is all wrong. The dispute between empiricism and ration- alism—between, on the one hand, Locke, Berkeley, Hume, and their followers, including Bertrand Russell and the logical empiricists such as Rudolph Carnap, and, on the other, Descartes, Leibniz, Spinoza, and their followers—is a dispute about how knowledge claims are to be verified, how we can come to have reliable knowledge about the world as opposed to merely having uncertain and unreliable beliefs. The dispute is not, except incidentally, about thought processes. The empiricists thought that claims to knowledge have to be tested by experience; the rationalists thought that secure and certain knowledge had to be known a priori, prior to any empirical tests, by pure reason, by deducing truths from self-evident axioms. For the rationalists mathematics is the model for all genuine knowledge, and it is no accident that two of the founders of ration- alism, Descartes and Leibniz, were brilliant mathematicians.

In the history of knowledge empiricism proved overwhelmingly victorious over rationalism, and, indeed, modern science is the legacy of philosophical empiricism. Kant, in The Critique of Pure Reason, tried to salvage what he could of rationalism from Hume’s criticisms of it; at the time his book was regarded as skeptical because he accepted so many of the empiricist criticisms of rationalism. Nowadays, when the victory of empiricism is taken for granted, his work does not look skeptical. Nowadays, indeed, we are all empiricists, Pinker and Chomsky included.

The basic principle of empiricism, that knowledge of the real world has to be gained by empirical investigation, is now so overwhelmingly accepted that it is hard for Pinker to realize what Descartes, Leibniz, Spinoza, and their followers actually believed. When he tries to characterize rationalism he is hopelessly off the mark when he says that one of its essential traits is to be “obsessed by combinatorial grammar.” One wonders whom he is thinking about. Leibniz, perhaps, but certainly not Descartes or Spinoza, and indeed the greatest twentieth-century empiricist of them all, Bertrand Russell, had more to say about formal combinatorial systems than all the rationalists put together.

Thirdly, another concern of Pinker’s is to argue for his point of view in the dispute between traditional serial computer models of cognition and connectionist computer models. The traditional models treat the brain as a digital computer going step by step through a series of symbol manipulations. Connectionist models are said to be “neuronally inspired.” That is, the connectionist theorists try to create computer models which, like actual brains, process information that is distributed over a large number of elements that are arranged in networks and that operate simultaneously in parallel circuits rather than in the linear serial fashion of traditional computers. Because the connectionist system processes information by a series of interacting parallel processes where the information is distributed over an entire network, this style of computation is often called parallel distributed processing (PDP).

Some years ago Pinker and his colleague Alan Prince had a debate with David Rumelhart and J.L. McClelland, who had developed a connectionist network for learning the past tense. This debate keeps coming up over and over again as a leitmotif in this book, and Pinker is anxious to insist, correctly in my view, that the connectionist networks he criticizes are weak at handling general rules like the past-tense rule but are better at handling specific patterns and associations, such as those involved in face recognition. He tells us that the rule approach of traditional computer modeling is right for the regular past tense, and that connectionism works for the irregulars.

From the way that Pinker describes the debate, however, the reader cannot get a clear picture of the overall issues involved. First, neither the serial nor the PDP simulation actually understands anything. These are simulations of cognition, not duplications. A computer simulation of understanding does not understand the past tense or anything else, any more than a computer simulation of digestion actually digests anything. Secondly, if they are to be construed as models of cognition and not as actual duplications of cognition, at some level the connectionist models have to be on the right track because the brain itself is in fact a set of extremely complex parallel systems.

The trick is to see how such systems can carry out serial logical operations, and none of the PDP models I have seen is very good at such operations. My own guess is that we will get a grip on how a connectionist brain can carry out serial logical operations only when we understand actual brain operations better than we do now. The shift in research interest in cognitive science from computer simulations to neurobiology is taking place right now as cognitive science moves from the computational model to cognitive neuroscience.

For me the best part of Pinker’s book consists of his discussion of linguistic examples, their history and their logical structure. If a baseball player hits a fly ball to center field, and it is caught, why do we say “He flied out to center” and not “He flew out to center”? In a similar vein, he asks, if we go for a joyride around town why do we say “We joyrided around town,” and not “We joyrode around town”? Why does the irregular verb suddenly get a regular past tense? The answer in both cases is that the sentences being put in the past tense do not contain genuine occurrences of the verbs “fly” and “ride.” If the batter hits a fly, the noun “fly” comes from the verb “to fly,” and because virtually any noun in English can be used as a verb (a point Pinker does not make but could have) we can then reverbalize the noun “fly” to get the verb phrases “fly to center” and “fly out,” in which “fly,” like new verbs in general, takes the regular past. The pattern is verb to noun to verb. The same argument works for “joyride,” which is not initially a verb, but a noun made out of a verb with an adjective prefixed. It generates the new verb “joyride,” which like nearly all new verbs takes the regular past, even though it is easy to imagine someone saying “joyrode” as a derisive comment on both the word and the experience.

For Pinker as for Chomsky, the first and deepest puzzle about language is in accounting for its boundless expressive power, its infinite expressive power with finite means. There are, he tells us, two tricks to the workings of language, words and rules. Words require memorized pairing between a sound and its meaning. Meanings, he thinks, are entities. He even draws a picture of one when he gives us a stylized drawing of a rose to represent the meaning of “rose.” The words are combined by the rules of generative grammar and such rules give us the limitless expressive power of human languages.

The pleasures of Pinker’s book are in his accounts of the results of detailed scholarship and theories about particular words and forms. I, for one, did not know that medieval English had the regular past tense for have and make, haved and maked, but that the sheer frequency of the occurrence of these verbs led to our current shorter had and made.

The danger in such a book, and the worst thing we can do when we write for a large audience, is to give the readers the impression that they understand something they do not really understand. This is a danger that Pinker does not always avoid. Of course to understand a language you have to know the meanings of words and be able to combine words according to rules. But Pinker seems unaware that that is not enough. You also need an enormous amount of background knowledge. Furthermore the rules-and-words thesis leaves the interesting questions unanswered. What is a meaning? What is a rule? What is rule-governed behavior anyhow? Pinker has little to say about these issues and what little he does suggest is at best misleading. It is a mistake to think of meanings as some sort of introspective entities. Wittgenstein devoted much of his later philosophy to refuting precisely that conception. It seems to work for simple cases like “rose,” where it is possible to have the naive view that our knowledge of the meaning consists in having a certain mental image of a rose, but as a general account of meaning this is hopeless. We have no general pictures of “if,” “therefore,” “however,” and “because,” but we furthermore do not even have general pictures for “speculation,” “annoyance,” “incoherence,” “analysis,” and “refurbishment,” to mention just a few concepts out of thousands one could mention.

Furthermore, even in cases where we could form a mental picture as in “rose,” “dog,” “cat,” or “house,” having a mental image still won’t tell you how to use the word correctly. Thus, as Wittgenstein argued, even if we carried around little pictures of roses, dogs, cats, and houses, how do you know that the word is used to name a type of object rather than the shape of the object, or the color of the object? Pinker gives the reader the mistaken assumption that questions about the nature of meanings, rules, and rule-governed behavior are all very simple and have indeed been resolved. They have not.

Within its limitations, as an account of recent results in linguistics Pinker’s book is useful. Its big weakness is that it doesn’t go far enough or deep enough. To go farther and deeper, Pinker would have to reflect harder on words, rules, meanings, and language than he is prepared to do in this book. I think the reason he does not go deeper is that he holds a rather naive computational theory of the mind. He thinks that we arrive at the meanings of sentences from words and rules by a kind of mechanical computational procedure of the sort we might simulate on a digital computer. He thinks we find the meaning of the sentence by taking it as a kind of big object and performing combinatorial operations on the little objects that are the meanings of words. That account will not work. It fails to recognize that to understand most sentences you need a great deal of background information that is not contained in the meanings of the words and the rules for their combination.

—This is the second of two articles on linguistics.

This Issue

March 14, 2002