Noam Chomsky
Noam Chomsky; drawing by David Levine

From time to time, ever since Plato, grammar has been more than the bane of schoolchildren or a topic for scholars. It owes its present prominence outside linguistics to some theses stated twenty-five years ago by Noam Chomsky. There is, he said, a universal grammar common to all human languages. Children are born with it: their inheritance explains the ease with which they pick up the language they hear around them. Universal grammar is like an organ of the body whose structure is genetically determined. It is a characteristic of the human mind and an essential part of the discontinuity between people and beasts.

That is quite an array of paradoxes. How could so arid a subject as grammar be part of the definition of our humanity? When hardly anyone can talk grammatically in more than two languages and when many are deficient in one, what is so universal? There is also a prejudice that Chomsky makes us a little ashamed to confess: grammar is just not the kind of thing one could inherit.

Paradoxes alone did not fuel Chomsky’s success. From the start he had a neat definition of grammar as a set of rules that can be mechanically applied to test whether a string of words forms a grammatical sentence. Then he obtained a negative result. Taking a natural and widespread approach to grammar, he cast that approach into a precise form and proved that it is necessarily incapable of providing an adequate grammar for English. This result was important not only for what it said but also because it suggested a new kind of thing to do—that sort of result had not been thought of before.

Then Chomsky did much positive work. He polished up a current idea of grammatical transformation and made it plausible as the main tool for doing grammar. He used an ear-catching phrase: “deep structure.” By this he meant that the sentences we use in thinking and speaking are the result of transformations on structures that underlie the surface arrangement of nouns, verbs, prepositions, and so forth. The speaker is not consciously aware of these structures or the operations upon them; they must be inferred from linguistic abilities. Deep structure added to the appeal of universal grammar, for the “universal” in grammar might be down there at the not-so-conscious level of deep structure, which is why we never noticed it before.

These proposals have since evolved, and Rules and Representations is a useful book with which to catch up on the state of the art. The book consists of four lectures (the Woodbridge Lectures at Columbia also given as the Kant Lectures at Stanford) and two related pieces. There is nothing technical in the book but to read it you do need a relish for argument. The lectures might be called “Chomsky Against the Philosophers.” Philosophers have much admired him but have also criticized some features of his work. Here he examines their arguments. It is like watching the grand master play, blindfolded, thirtysix simultaneous chess matches against the local worthies. He almost always wins.

There is perhaps some general lesson about reason to be gleaned from this book. Chomsky must be one of the most reasoning of living men. I’ve heard him called Talmudic but there is nothing ethnic about this: to me he sounds like a Presbyterian preaching double predestinarianism. He runs through arguments again and again, so that some are repeated in the two papers tacked on to the lectures, and were also found in Reflections on Language (1975). Chomsky is sometimes a teacher saying, “You haven’t quite seen the point, come, let’s go over it again, here is the first premise….”

This passion for reason allows one to forgive what would, in another writer, be repetition. But I liked a remark in the second book under review, which derives from a symposium in France featuring a debate between Chomsky and Jean Piaget, the great Swiss pioneer of the genesis of concepts in the child.* The philosopher Hilary Putnam starts his contribution to the symposium by noting “the sense of great intellectual power” one gets on reading Chomsky, and then announces with exasperation, “Yet I want to claim that individual arguments are not good.” Putnam is Chomsky’s equal as reasoner, but many attentive readers will be unsure who wins this skirmish. Once argumentation has been pushed to this limit, reason alone will not settle much. What matters is the outcome of the research. All we can hope from arguments is the conviction that the research is well motivated—and that, in this case, we get in abundance.

Chomsky used to say that children have innate knowledge of universal grammar. Philosophers have queried whether this could properly be called “knowledge.” Irritated, Chomsky says call it something else; say children “cognize” grammar. What is important for him is that this cognition is a physiological state which will manifest itself in behavior but is not to be defined in terms of behavior. We should not think of cognitive abilities arising from one undifferentiated organ (the brain?) but we should expect a lot of units, or modules, that interact to perform various jobs. Even to produce a single grammatical sentence the brain will employ different modules which may have matured at different stages of development. An infant that has not yet begun to speak still “cognizes” grammar in the sense that it has the appropriate modules which can be triggered by various stimuli as it grows up. If it becomes a feral child growing up alone, it will never mature into speech, but this is just because the appropriate modules have not been triggered.


Such speculations leave untouched many philosophers’ questions about knowledge, but in my opinion they are well left untouched, at least here. There are metaphysical questions about knowledge and there are physiological ones. Even the “rules and representations” of Chomsky’s title turn into physiology. What he means by a rule is plain enough from his examples. There should be a rule for forming questions out of declarative sentences, say, take the verb that comes after the first noun phrase and move it up front. (“The man who wore black was ill” becomes “Was the man who wore black ill?”) Such a rule works on an analysis of the sentence—it does not say, move the first verb (“wore”) but, move the first verb after the noun phrase (“was.”)

Deep structure is no longer prominent in Chomsky’s work. The rules he examines now work close to the surface of the words we actually utter. I say “close to the surface” because the rules do not act only on the strings of words that we utter or hear, but also on something like echoes of other sentences. The rules involve unuttered “traces” of transformations, and so this development is called trace theory. For example, colloquial English can contract “want to” to “wanna”—“Who do you wanna meet?” But though this is a contraction of “Who do you want to meet?” we do not contract “Who do you want to meet Bill?” (“Who do you wanna meet Bill?” is odd). Chomsky explains the difference by saying that questions bear a trace of the declarative sentence. “You want to meet x” is a declarative form whose question is “Who do you want to meet t?” Here the trace t marks the place that the “who” came from. The declarative form “You want y to meet Bill” has the question “Who do you want t to meet Bill?” Here the unuttered trace t comes between “want” and “to,” and stops the contraction “wanna.” This representation of traces, of echoes just below the level of the uttered surface of words, is what Chomsky calls “S-level.”

A sentence may be represented by the words we utter; it may be represented at S-level; it may be represented at deeper levels of analysis. Rules operate on representations, at some level or other. In the above example, the rule operates at the S-level. Such representations are all themselves bits of language. Most grammarians want nothing more than an analysis stated in language. But Chomsky calls himself a psychological realist. For every item of psychology, such as a representation of a sentence, there is to be a corresponding bit of physiology. There must be a representation in the brain, to which a rule, in the brain, applies. Maybe a mechanical analogy will help to explain this.

Take a really old-fashioned calculator, a nineteenth-century difference machine made of brass and steel. Given a sequence of numbers it could calculate, say, the second differences, that is, the differences between the differences between the members of the sequence—not just the difference, for example, between numbers a and b but between their difference and the difference between c and d. (This was part of the trick of making tables of logarithms.) Set some numbers on the machine, turn the crank, and it prints out a sequence of second differences. We have a rule expressed in natural languages such as English (“print the sequence of second differences”) and output in English, the printout of numbers. But we also have a nonlinguistic version of the rule and the sequence, in the settings on brass and steel. The rule is made incarnate in brass and steel, and so is the sequence of numbers. As you crank it, the machine first arrives at and then operates on a “representation” of first differences in order to calculate the second differences; but there is nothing linguistic about this, it is just an arrangement of brass and steel.


In the same way, Chomsky thinks of representations made incarnate in flesh and blood, and the rules, themselves incarnate, act on these. Doubtless, he says, different modules are employed in connection with different levels of representation. The machine is a poor analogy with Chomsky’s thought because it ignores the creative aspect of language use. The difference machine is determined from its initial setting to its final printout while the mind has a lot of freedom. I use the machine to emphasize that representations of sentences need not be anything linguistic, although we can put each representation into a linguistic form: we can produce a blueprint of the first-difference stage of the machine’s operations, something linguistic corresponding to the nonlinguistic arrangement of brass and steel.

The idea of flesh-and-blood representations escapes an objection that derives from Wittgenstein and other philosophers. At the very beginning of the lectures Chomsky refers to “the myth of the museum”—the idea that our minds are like museums containing mental objects that we can inspect and that are called meanings. In explaining meanings, so goes the Wittgensteinian argument, it is no use postulating mental objects, like museum specimens, as what we intend. That would invite a regress, for how do we pick out the right mental object? Do we need a mental rule to do so? If we use a mental rule to apply a verbal rule, how do we apply the mental rule? As Chomsky says, if this objection were sound, it would seem to apply quite generally. Hence it would be a threat to the postulation of any mental entities to explain intentional behavior, including speech. Now the Wittgensteinian argument regards the mental entities as themselves language-like—that is how the “regress” is effected. Chomsky’s internal rules and representations, as I understand them, are not language-like at all, and so the regress is blocked. The flesh-and-blood rules can be described in language (e.g., by a grammarian) just as the brass-and-steel settings on the difference machine can be shown on a blueprint. But the blueprint is not what causes the machine to work, and the grammarian’s description of the rules and representations is not what we use, either consciously or unconsciously, in producing sentences. The grammarian’s description is part of an account of a flesh-and-blood module in the brain that we do use.

The Chomsky program is easily misunderstood on this point because at a quite different level we also use rules of grammar to regiment our children and to make sense out of long-winded authors who need to be parsed in order to be understood. It cannot be too much emphasized that Chomsky’s rules and representations are not tools for pedants but descriptions of the brain. Would Wittgenstein be happy with this gloss? No, for he says, sometimes, that we should not aim at explanation at all, and in particular, our communication could have the character it does regardless of how the brain worked. And perhaps there is still a regress lurking around the corner—how do I know what I want the grammatical module to allow me to say, right now, with this sentence? Well, that will be a matter of interacting with other modules. There seems to be some nagging ill-formulated question that would arise even if we knew about all the modules—a question about which we may learn more from reading Wittgenstein than Chomsky.

Chomsky’s psychological realism in any case has had plenty of critics, for he cannot point to any modules in the brain. He defends it as good methodology. It is the standard method of science, the “Galilean style,” which has been the only show in town for the last three and a half centuries. Frame powerful hypotheses rich in explanatory power and try to work them out in detail. All hypotheses are tentative. Some will be refuted and all will be revised. A hypothesis made in this spirit takes for granted that what it is talking about is real. Certainly, grants Chomsky, there are philosophical questions about the “reality” of theoretical entities, but these are questions about physics just as much as psychology. By all means ask whether atoms and electrons are real, but don’t think there is some special question about psychological realism.

Historians will have qualms about this simplified history of scientific method since Galileo, but certainly the method of hypothesis is respectable right now. We should distinguish two kinds of things: (1) a picture of what reality might be like and (2) a hypothesis which has some immediate experimental hookup with some observable consequences. Democritus and Lucretius told a story about atoms, saying, that is what the world is like, “atoms and the void.” Powerful as this picture was, it had no observable consequence. Even the seventeenth-century atomists whose culmination was Newton chiefly advocated a picture of a world composed of little bouncy balls, with precious few observational consequences.

Only at the beginning of the nineteenth century did the atomist picture start to interlock with observational data, and only at the beginning of our century did the majority of physicists become convinced atomists. We might say that the picture guided the minds of men, but did not do any work. A hypothesis does work when it has specific observational consequences; such a hypothesis, when it involves theoretical entities, certainly takes for granted the reality of its entities. The critics of Chomsky’s psychological realism may be thinking that he is offering only a picture, and then trying to pass that off as a hypothesis that does some work. To see if that is correct we should distinguish some parts of the doctrine.

There are four main ideas in Chomsky’s work:

1) Transformational grammar: a conjectured and constantly revised set of rules and representations for some parts of English.

2) Universal grammar: the claim that all human languages share a common grammatical core.

3) Psychological realism: the grammar of a language is incarnate in the flesh and blood of its speakers.

4) Genetic grammar: a central part of everyone’s grammar is inherited in his genes.

Genetic grammar commits you to universal grammar and psychological realism, but otherwise these four are pretty independent. The best transformational grammar may come from the pen of someone who hotly denies 2, 3, and 4. A psychological realist could reject universal and genetic grammar, holding that each language has its own psychological reality. A universal grammarian can reject inheritance—that seems to be Piaget’s position during his debate with Chomsky.

A conjectured transformational grammar for English is of course a hypothesis that, in the terms I have been using here, does some work. One tests it against the phenomena of English sentences. What about the claim for universal grammar? That does work exactly once in these two books, rather late in the Debate. Chomsky examines relative and restrictive clauses. (“The man who came to dinner was ill” has the clause “who came to dinner” that restricts the subject. “The man, who came to dinner, was ill” contains a relative clause that says something about the man already singled out.) The kinds of clause behave differently. Chomsky offers a rule to explain the difference. But it does not apply in Japanese; indeed that language does not have a clear distinction between the two kinds of clause. So, we are told, something must be wrong with the rule. But we are given no rule that holds in Anglo-Japanese. This is a case in which the idea of universal grammar is doing some work. Several writers have tried to carry on discussions of it in this way, but there is no evidence in the books under review of much success. Only when we are getting somewhere with universal grammar will its critics think this is a working hypothesis and not a mere picture.

A picture of what the world might be like is very often vastly more important—and more long-lived—than any hypothesis. Atomism has endured forever. It was first propounded to explain some puzzles about motion and solidity that we have not quite forgotten. It is worth recalling the facts which Chomsky thinks are most surprising, and most worthy of understanding. There are two.

1) The fact that children come to talk grammatically at quite an early age. In general children are not taught to speak, and the words they overhear are insufficient to fix the grammar which, in fact, they acquire when quite young.

2) The fact that there is a sensation of grammar. We can tell almost at once which short sentences are grammatical. Chomsky drew our attention to a special case of this. Some sentences are ambiguous simply in virtue of their grammar, while very similar sentences are not; how come we so instantly tell the two kinds apart?

Chomsky has always found these two facts dramatic, demanding an explanation as profound as genetic universal grammar. This is not a hypothesis that explains the facts in any detail, but a picture of what the world might be like, and a proposal of where to look for detailed hypotheses. If one does not like the picture one has an obligation to produce another one (possibly playing down Chomsky’s facts and emphasizing others). Hence the debate with Jean Piaget was a good idea, for here is another school of cognitive psychology that might give us another picture of the grammatical child.

Piaget has long studied the ways in which children mature in their abilities. His work has the greatest interest for our conceptions of space and time, topics which, for him, have a Kantian motivation. He is skeptical of standard evolutionary theory, for he thinks there is still some room for Lamarck-like adaptation of successive generations of a kind of organism to its environment. He thinks that a child, as it matures, repeats some of the stages in the intellectual development of mankind, so that the emergence of its reasoning skills resembles the history of mathematics itself. He claims to have found sharp discontinuities in the development of a child’s abilities, discontinuities that correspond to differences in logical structure. He thinks human minds are born pretty empty but form successive spatial structures in the course of interacting with the world; the final product is “our” spatial world. Before that there are other spaces that the child inhabits, preconditions for the final spatialization that derives from the way in which the child comes to handle objects. If we transform this to the grammatical sphere, there would be no grammar carried in the genes. Just as there is a sequence of spatial structures that are “constructed” by the child in interaction with its environment, so we might by analogy expect a sequence of grammars, of which the end product is a grammar of English. The intervening grammars would apply to various levels of childish talk, and we might look for sharp differences between them. We would expect each successive child-grammar to be the product of the child’s interaction with the people that talk to it, and its own attempts to communicate. Moreover, we could suppose that each child-grammar must be constructed before the child can pass on to the next more complex grammar, leading in due course to the steady state of adult English.

The Debate had promise but was a failure. Piaget is conciliatory, Chomsky firm. Piaget and his associates do not really attend to what I call the “sensation” of grammar. The format for the Debate was a French conference to which a small number of distinguished scientists and philosophers were invited. Interesting things were said and this book is a good bedside dipper. It is fun to read about David Premack teaching plastic “words” to his chimpanzees. The implied argument is that the animals acquire a syntax too, so grammar can’t be as specific to humans as Chomsky contends. I was glad to find Jean-Pierre Changeux complaining that linguists and the like keep on treating the brain and genetics as a “black box” when a lot is known. He speculates on the amount of information that can be genetically carried, notes that it is too little for something like grammar, but then makes a fairly standard remark of importance. If the genetic material is deployed in a hierarchical way, the possibilities of inheritance are immensely increased along with possibilities for minor deviations. As he says, this could be made to fit with Piaget’s picture. But no one has yet made Piaget’s picture mesh with the facts of grammar that Chomsky thinks are important. The conversations recorded in the Debate are about learning. They are quite idle until one thinks about the “sensation” of grammar. Hence the two sides in the debate simply don’t speak to each other.

Is there any picture of grammar that can rival Chomsky’s? There is a naïve picture. A child overhears lots of sentences and has a marvelous memory. Maybe we should not say “memory” but invent a world like the “cognize” that Chomsky substitutes for “know.” Anyway the child files away lots of sentences, and constructs others by ringing a few changes on these. Most children do a lot of rehearsing in their cribs, and that, says the naïve picture, is a matter of storing sentences in the head, as well as hoping for some parental correction.

How does the naïve view differ from Chomsky’s? It can have psychological realism. What a child learns is encoded at a physiological level. It can have universal grammar, but only on the side. It assumes all sorts of innate abilities, like the ability to imitate sounds. It might even use early work by Chomsky and Miller on the relation between short term and long term memory.

The naïve view can make no sense of deep structure, so the development of Chomsky’s ideas away from deep structure lessens the contrast between these two pictures. The echo of a transformation that was used to prevent “want to” from turning into “wanna” is just what the naïve theorist would expect. There might be a big file of sentences encoded in the brain and the child hears “echoes” of these and so says “want to.” Could trace theory be the theory that brings Chomsky back to naïve reflections on language?

The answer is a resounding NO but only because we come back, as always, to where Chomsky started. His first philosophy teacher was Nelson Goodman, who was Chomsky’s sponsor when he was a Junior Fellow at Harvard, but who has no truck with innatism. Goodman showed in a vivid way that past experience is an inadequate guide to future experience. In Chomsky’s terms, the past experience of a child is an inadequate guide to the grammar it so firmly masters. One step in Chomsky’s argument has convinced almost every one of his readers: it is now a commonplace. But if we are to reconsider the naïve theory for a moment, nothing should be commonplace. He says the child learns an infinite language on the basis of finite input. To which the naïve theorist can say Yes and No. Yes, for on the basis of the “sensation” of grammar, the child can then go to school and learn some rules, devised long ago by Latin teachers, or nowadays by mathematicians, on the basis of which it can both parse long obscure sentences and see how to generate indefinitely long sentences (“He swam” add “and” add “He swam” add “and” add etc.). But this reasoning surely uses a different module from the ones connected with the “sensation” of grammar which it was our task to explain? The class of sentences for which we have a “sensation” of grammar—and which gets the whole program going—does seem large but finite.

Is the grammar that the child acquires too “large” to be based on what it overhears? That is where we come down to details. In both books under review Chomsky cites some curious distinctions in our usage of reciprocal phrases such as “each other.” He says all children make the distinctions but these cannot have been based on what they overheard. If you doubt this, he says in the course of the Debate, do an experiment. It would be some crazy experiment based on videotapes of twelve years of the lives of lots of children, but, says Chomsky, we’d learn little relevant from that. The naïve theorist will agree, but say that is because we have no idea what questions to ask from such an impossible videotape. What we need is a better theory of what the child remembers, when, and how it echoes its memories through trace theory.

When we have some such picture we can start to ask Piaget-like questions. Piaget found that there are sharp discontinuities between the ages at which children perform only slightly different spatial tasks. It would be an achievement to find similar discontinuities in the “sensation” of grammar. That would prove nothing about Piaget’s picture. Chomsky would say we had only discovered facts about the triggering of grammatical modules. But one such single discovery would move two competing pictures a little closer to the point at which they would become working hypotheses. That, as always, is the way that speculation gets turned into knowledge.

This Issue

October 23, 1980