Jensen's Last Stand | Stephen Jay Gould | The New York Review of Books

Submit a letter:

Reviewed:

Bias in Mental Testing

by Arthur R. Jensen

The Free Press, 786 pp., $29.95

Arthur Jensen; drawing by David Levine
Buy Print

I. Jensen Then and Now

There are many styles of retreat in the face of failure. As a first and most forthright strategy, one can simply be humble and contrite. Clarence Darrow once stated that if God really existed after all, and if he were, following his death, arraigned before God as judge with the twelves apostles in the jury box, he would simply step up to the bench, bow low, and say: “Gentlemen, I was wrong.” In a second, intermediate strategy—the stiff upper lip—one looks upon the bright side (or sliver) of admitted adversity. When Robert FitzRoy, Darwin’s captain on the Beagle, learned that Jemmy Button, the Fuegian native he had trained in English ways, had “reverted” completely to old habits within months of his return, FitzRoy took refuge in the thought that “a ship-wrecked seaman may hereafter receive help and kind treatment from Jemmy Button’s children; prompted, as they can hardly fail to be, by the traditions they will have heard of men of other lands.” As a third tactic, one proclaims triumph and punts hard. I remember Senator Aiken’s brilliant solution to the morass of Vietnam—that we should simply declare victory and get out.

Arthur Jensen has published an 800-page manifesto embodying this third strategy. To understand why it represents a retreat—and a failed retreat at that—we must review the history of its genesis. In his notorious article of 1969,¹ the founding document of “Jensenism” as a public issue, Jensen maintained that compensatory education must fail because the black children that it attempted to aid were, on average, genetically inferior to white children in intelligence. He based his claim on a strong form of genetic argument: the heritability of IQ, he maintained, had been adequately estimated at about 0.8 among whites; therefore, the 15 point average difference in IQ scores between blacks and whites must be largely innate in origin.

The intervening decade between this article and the present book has not been kind to Arthur Jensen. First of all, the estimate of heritability, depending so heavily on Sir Cyril Burt’s faked data,² is clearly too high. Second, and more important, the value of heritability within either the white or the black population carries no implication whatever about the causes for different average values of IQ between the two populations. (A group of very short people may have heritabilities for height well above 0.9, but still owe their relative stature entirely to poor nutrition.) Within and between group variation are entirely different phenomena; this is a lesson taught early in any basic genetics course. Jensen’s conflation of these two concepts marked his fundamental error.

I assume that Jensen now understands where he went seriously astray. The present book bypasses the issue of heritability entirely. In dismissing this previous bulwark of his system in just two paragraphs, Jensen simply states that the matter is too complicated for treatment here, though he relishes some of the most arcane and complex issues of psychometrics throughout his 800 pages. Heritability, he argues, “is a highly technical and complex affair involving the principles and methods of quantitative genetics.” “Because even an elementary explication of heritability analysis is beyond the scope of this book, the interested reader must be referred elsewhere.” His list of appended references includes not a single one of several cogent critiques directed against his original thesis.³

Moreover, Jensen now claims that there really isn’t any need to talk about genetics anyway: “Because we have no estimate of the individual’s genotype that is independent of the test score, there is really no point in estimating genotypic values.” In fact, he now virtually argues that the subject of causation should be dropped entirely: “The constructors, publishers, and users of tests are under no obligation to explain the causes of the statistical differences in test scores between various sub-populations. They can remain agnostic on that issue.” Yet his acceptance of this very obligation is the motivating theme of his 1969 article.

Am I not being unkind in bringing all this up? A man should be allowed to change his mind with grace, and to save face in his expiation. But Arthur Jensen hasn’t altered his basic tune at all. He is simply using a different, and more indirect, argument to prop it up. And he has buried the central fallacies of that argument so deeply among the apparent rigor of these 800 pages of lists, figures, and charts that no commentator in the mass media has yet ferreted them out.⁴ Jensen’s fundamental claim is still about innateness. Indeed, it is still the same claim: blacks are less intelligent than whites and this difference cannot be attributed to environment.

In reasserting his 1969 claim in its more indirect form, Jensen constructs an argument of three parts:

The average difference in IQ between whites and blacks is about 15 points, or one standard deviation. Other tests of intelligence show comparable differences.

The tests are unbiased.

IQ (and other valid tests of the same mental attributes) measure something that we can legitimately call “intelligence.”

Note that although the argument says nothing about genetics or innateness, it seems to lead inexorably in that direction. After all, if blacks perform more poorly than whites on unbiased tests that measure intelligence, then blacks must be less intelligent than whites for reasons unrelated to environmental deprivation (our usual supposition for the cause of “bias,” as we use the term in the vernacular). What reasons besides innateness are left?

Of the three arguments, only the first, an undisputed fact, compels any assent in our usual, vernacular interpretation. It also leads nowhere because it implies nothing whatever about the reasons for the difference. The tests themselves may record nothing of interest; potential reasons for difference span the entire range from pure environmental imposition to pure innateness. Jensen’s argument becomes meaningful and controversial only if the second and third points are valid in our vernacular understanding. I believe that they are not valid and that this book, despite its wealth of interesting technical detail for psychometricians, therefore contains no important or general message to enlighten the general concern that we all must feel for the issue of human potential.

II. The Meanings of Bias

Jensen has titled his book Bias in Mental Testing, and most of its ample length is dedicated to proving either that there isn’t any, or that it can be recognized and corrected if there is. And I’m sure that he’s right.

The last paragraph may, at first glance, seem utterly destructive to my case, but it isn’t. Things are seldom what they seem in statistics, and a layman’s understanding of the field has been plagued by important differences between vernacular and technical meanings of terms. “Significance” and “discrimination” may provide the two most notable cases of difference between English vernacular and statistician’s jargon, but “bias” belongs in the same category. In proving that tests are not biased, Jensen speaks to statisticians’ interests and not at all to what the public understands by the common charge that IQ tests are biased.

Average black IQ in America is about 85, average white IQ about 100. The charge of bias, in our ordinary understanding of the word, holds that this poorer performance of blacks is a result of environmental deprivation relative to whites, and that it does not reflect inherent ability. The vernacular charge of bias (I shall call it V-bias) is linked to the idea of fairness and maintains that blacks have received a poor shake for reasons of education and upbringing, rather than of nature.

For two months, I have tabulated every use of the word I have seen in the popular press, and all have conformed to this understanding. The New York Times, for example, reported that Federal Judge Robert Carter “has ruled that the examination [for police officers] was biased” because so few black and Hispanic applicants scored among the highest grades. And The Sacramento Bee outlined Judge Robert F. Peckham’s decision to ban IQ tests as a criterion for placing children in EMR (educable mentally retarded) classes in California. Peckham was disturbed by the preponderance of blacks in such classes and ruled “that there probably could not be any substantial disproportion of blacks…if the process of selection was unbiased.” In other words, both assumed that the rarity of high scores for blacks or low scores for whites does not reflect natural aptitude fairly. If Jensen had proved that tests are unbiased in this sense (V-bias), he would have made an important and deeply troubling point.

But “bias,” to a psychometrician, has an utterly different and much narrower meaning—and Jensen addresses himself only to this technical sense (which I shall call S, or statistical bias). An intelligence test is S-biased in assessing two groups only in the following circumstance: Suppose that we plot the scores on an intelligence test showing them in relation to what we wish to predict from the test—job performance or school grades, for example. The test is unbiased in a statistician’s sense if and only if points for blacks and whites fall along the same line—that is, if blacks and whites, plotted separately, differ neither in slope, “y-intercept” (the point of intersection between the line for whites or blacks and the vertical axis) nor standard error. If this seems confusing, consider Figure 1, an example of intercept bias. Whites and blacks have the same slope, but whites have a higher y-intercept.

It is not difficult to see why psychometricians want to rid themselves of S-bias; for in an S-biased test, the same score yields different predictions based upon group membership. In Figure 1, an IQ of 100 predicts poorer grades for a black than for a white. No sensible tester wants to construct an instrument in which the same score means different things for different kinds of people.

Jensen devotes most of his book to showing that S-bias does not affect mental tests (or that it can be corrected when it does exist). Yet I found nothing surprising in his densely documented demonstration that tests are unbiased in this sense. It would be a poor reflection indeed on the technical competence of psychometrics if, after nearly a century of effort, they had found no way to eliminate such an elementary and undesirable effect.

Thus, in saying that the tests are unbiased, Jensen has only managed to show that the lower black and higher white mean scores lie on the same line (see Figure 2). And this unsurprising demonstration says nothing at all about the vernacular charge of bias. Does the lower black mean reflect environmental deprivation rather than inherent ability (V-bias)?

Of course, Jensen admits this. He distinguishes his notion of bias (S-bias) from our vernacular idea of fairness to all cultures. He also admits that such fairness cannot be defined objectively and thus undermines his own larger case: “One can determine with objective statistical precision how and to what degree a test is biased with respect to members of particular subpopulations. But no such objective determination can be made of the degree of culture-loadedness of a test. That attribute remains a subjective and, hence, fallible judgment…. The term ‘bias’ is to be kept distinct from the concept of fairness-unfairness.”

Yet these brave words are obfuscated or diluted throughout the book for three reasons. First, although he makes the distinctions fairly and forthrightly, he buries them on two pages in the middle of a lengthy work, and does not emphasize them thereafter. Second, Jensen correctly points out that some kinds of S-bias may have an environmental source. (The higher y-intercept of whites in Figure 1, leading to higher school grades for whites than for blacks for the same test score, may reflect environmental advantages not measured by the test.) Thus, the concepts of S-bias and environmental difference become subtly conflated—even though the existence of S-bias is irrelevant to the key question about environment that has sparked the whole debate: does the lower black mean reflect environmental disadvantages? Indeed, Jensen’s 1969 article argued that the mean differences could not be attributed to V-bias because they are primarily genetic in origin.

Third, and most importantly (and annoyingly), Jensen, after making clear distinctions between S-bias and culture fairness, then proceeds to confuse the issue completely by using “bias” in its ordinary vernacular sense over and over again. He speaks, for example, of “the hypothesis that, when the Stanford-Binet is administered to any population other than the original normative sample, the different population should score lower than the normative sample because of cultural biases.” In another place he speaks of two tests that were “culturally biased” to award rural or urban children the higher score. These passages speak of mean differences between groups that may lie on the same line in plots of test scores vs. criterion. The potential cultural bias is therefore V-bias. But Jensen has told us that we may not use the term “bias” for such a concept.

In short, the primary content of this book is simply irrelevant to the question that has sparked the IQ debate and that Jensen himself treated in his 1969 article: what does the lower average score of blacks mean? His concept of bias (S-bias) does not address this issue. Yet, since this issue is intimately associated with our vernacular meaning of bias, nonstatistical reviewers (in Time and Newsweek, for example) have been consistently confused into believing that Jensen’s voluminous data force us to reject environmental causes as the basis for group differences in IQ scores.

III. Is Intelligence a “Thing”?

The Harvard psychologist E.G. Boring once suggested that psychologists might avoid the vexatious issue of identifying intelligence and worrying about whether or not tests capture it simply by defining intelligence as whatever the tests test. This brand of pragmatism has never appealed to hereditarians who want to believe that intelligence is a real attribute—something objective, “out there,” and measurable by tests. In addition, hereditarians have tended to adopt what I like to call the “fallacy of the ladder”—namely, that intelligence (or at least the most important aspect of it) is a unitary quantity that can be assigned as a single number to each individual. People may then be ranked on an ascending ladder from ape to Einstein—a single scale that captures the most important aspect of their potential. The entire concept of IQ is rooted in this fallacy.

The only important theoretical justification that psychometricians have ever offered for viewing intelligence as a real attribute that can be measured by a single number (IQ) arises from an arcane subject called “factor analysis.” Jensen argues correctly that factor analysis is “absolutely central to any theoretical understanding of intelligence.” Factor analysis in psychometrics has received virtually no discussion outside professional circles. This is particularly unfortunate since the history and theory of IQ testing cannot be understood without reference to it.

In arguing that intelligence is a single, definable “thing,” and that blacks possess less of it than whites, Jensen has resurrected the original form of Charles Spearman’s argument for factor analysis, a hypothesis that had been (or so I had thought) at best moribund since L.L. Thurstone made his devastating criticisms during the 1930s.

Charles Spearman developed factor analysis in 1904 to deal with an interesting, though unsurprising, observation: if several mental tests are given to several people, scores tend to be positively correlated—that is, people who do well on one kind of test tend to do well on others. Spearman wondered whether there might not be some common factor underlying this tendency for similarity of performance in each individual. He compiled what statisticians call “a matrix of correlation coefficients” and extracted from it a single number, which he called g or general intelligence. Spearman concluded.⁵

All examination in the different sensory, school, and other specific faculties may be considered as so many independently obtained estimates of the one great common Intellective Function.

A correlation coefficient is a measure of association between two tests given to several people. Its value ranges from -1.0 (a good score on one test implies an equally bad score on the other) through 0.0 (score on one test allows no prediction about score on the other) to 1.0 (good score on one implies equally good score on the other). Most correlation coefficients between mental tests are positive, but nowhere near a perfect value of 1.0—that is, people who do well on one test tend to do well on the other, but one score doesn’t predict the other perfectly and not everyone will do well on both tests. A matrix of correlation coefficients is simply a table that lists all the individual coefficients for a set of several tests.

Spearman believed that his g represented a physical entity. He identified it with a general inborn “cerebral energy,” and argued that blacks and poor people had less of it than whites and upper-class people. Sir Cyril Burt, Spearman’s successor as professor of psychology at University College, London, devoted the largest part of his career to defending the hereditary interpretation of g. We now know he used partly faked data to do so. But is g really an entity at all? To appreciate why it is not, we must understand how g is extracted from a set of correlation coefficients. Fortunately, this can be explained in a simple, geometrical way.

A set of correlation coefficients may be represented as a group of vectors (lines) radiating from a common point. With some (conceptually unimportant) simplification, we draw these lines as equal in length and state that the correlation coefficient between any two tests is given by the cosine of the angle separating the two vectors for these tests. (This matches our intuitions well. Two perfectly correlated tests have overlapping vectors—the cosine of zero degrees is 1.0. Two independent tests have vectors at right angles—the cosine of 90 degrees is zero. The closer any two lines, the higher their correlation coefficient.) In Figure 3, 1 consider four tests (two verbal and two arithmetic) in two-dimensional space. All four tests are positively correlated (any two vectors are separated by an angle less than 90 degrees), but verbal and arithmetic tests form separate subclusters (the two verbal tests are more strongly correlated with each other than either is to any arithmetic test). This represents the usual situation of positive correlation among all tests with a tendency for subclustering among tests of common character.

Now Spearman’s g is simply what statisticians call the “first principal component”⁶ of this set of vectors. It is the line of best fit, the single axis that “explains” more information in all the vectors than any other line in any other position could. In our figure, it runs (unsurprisingly) right through the middle of the cluster. Since vectors for all tests tended to lie near this first principal component, Spearman thought he had discovered an underlying, common intelligence that each test measured only imperfectly. Hence he called this axis g, or general intelligence. In our two-dimensional plots of these four vectors, we may fit a second principal component at right angles to the first. The projection of vectors upon it does identify a slight separation of the verbal from the arithmetic cluster, but the effect is small because the vectors are primarily resolved into the first component (g). In real data, the effect is often completely erased by patterns of variation and errors of measurement. Hence, principal component axes are good at identifying common variance among all tests and poor at defining clusters.

From 1904, when Spearman’s seminal article appeared, through the 1930s, factor analysis became an industry in psychology; it was invariably done in the principal components orientation described above. The g that it identified as the first principal component was reified into an entity and both people and groups were ranked according to the amount of it they supposedly contained. (Cyril Burt called his major book “The Factors of the Mind.”) Since vectors for standard IQ tests tend to lie close to the first principal component, they seemed to be an adequate surrogate for g and a valid criterion for unilinear ranking of people according to amount of intelligence. Spearman’s g became the rationale for extensive programs of sorting and streaming in education and therefore affected the lives and careers of millions. It was, for example, the primary justification for sorting British schoolchildren into separate schools at age eleven (when g had stabilized and before “group factors” of specialized abilities became important). And, to use a faddish phrase of the moment, the “bottom line” of that sorting, no matter what its official rationale, was “smart” (20 percent) and “dumb” (80 percent).

In the 1930s, the American statistician and psychologist L.L. Thurstone virtually destroyed this edifice with a simple and elegant argument. He pointed out that the principal components orientation for axes had no theoretical, mathematical, or psychological necessity. It merely represented one of a literally infinite number of possible positions for the placement of axes through a swarm of vectors. Where you place the axes depends upon what you want to learn. Given our deep and subtle prejudices for unilinear ranking and notions of progress, and our not so subtle preferences for ordering people by inferred “worth” (with one’s own group invariably most worthy), it is not surprising that principal components seemed the most “natural,” indeed the only proper way to perform factor analysis. But, Thurstone argued, suppose we are most interested in locating clusters of more specialized abilities, not in finding some inchoate, common variance. Then it would be better to place axes near the clusters themselves, in an orientation that Thurstone called “simple structure.”

Figure 4 shows a simple structure solution for the same four vectors illustrated previously. Note that verbal and arithmetic clusters are now clearly separated by high projections on one axis with correspondingly low projections on the other. Thurstone used these simple structure axes to identify what he called “primary mental abilities,” or PMAs. (He too committed the fallacy of reification and called his simple structure axes—and his major book—“Vectors of Mind.”)

Simple structure axes are every bit as good, mathematically, as principal components. They resolve the same amount of information and can be massaged to yield equally cogent psychological interpretations. But note what has happened to g, the supposedly ineluctable and innate quantity of general intelligence. It has disappeared; it just isn’t there any more. Instead of a pervading and dominating general intelligence and some secondary factors, we now have a set of PMAs. The data have not changed one whit. If the same data can yield either a dominant g with subsidiary factors, or a set of PMAs and no g at all, then what claim can either solution have to necessary reality?

Moreover, Thurstone’s system destroys the rationale for unilinear ranking. Instead of a dominant g, we have only a set of PMAs—and lots of them. Some folks are good at some things, others at others. What else can one say? As Thurstone wrote:

Even if each individual can be described in terms of a limited number of independent reference abilities, it is still possible for every person to be different from every other person in the world. Each person might be described in terms of his standard scores in a limited number of independent abilities. The number of permutations of these scores would probably be sufficient to guarantee the retention of individualities.⁷

In defending his third essential claim—that IQ tests measure something we may legitimately call intelligence—Jensen has resurrected the original form of the Spearman-Burt argument for g and the principal components solution. His commitment to the idea that intelligence is a single quantity, distributed in varying amounts among God’s creatures, is manifest in the most naïve bit of writing about evolution I have seen in years. Jensen would actually extend g throughout the animal kingdom, resurrecting for it the evolutionary ladder that Lamarck advocated, but that Darwin pulled down in showing us that phylogeny is a copiously branched tree, not a treadmill toward progress. Jensen writes:

The common features of experimental tests developed by comparative psychologists that most clearly distinguish, say, chickens from dogs, dogs from monkeys, and monkeys from chimpanzees suggests that they are roughly scalable along a g dimension…g can be viewed as an interspecies concept with a broad biological and evolutionary base culminating in the primates.

But primates are no culmination of anything, just a limb on the mammalian tree; and chicken-dog-monkey-chimp is simply not an evolutionary sequence. Jensen’s earlier statement about “different levels of the phyletic scale—that is, earthworms, crabs, fishes, turtles, pigeons, rats, and monkeys” is even more risible, especially when we recognize that modern bony fishes evolved more than 100 million years after turtles and that the evolutionary connection of crabs and vertebrates is not from one to the other, but through some unknown common ancestor that lived more than 600 million years ago. “The turtle,” whatever that means since there are hundreds of species of them, is not, as Jensen claims, “phylogenetically higher than the fish” (meaning even less since rivers, lakes, and oceans contain some 20,000 species of them).

Jensen is not even content with g as a criterion of ranking in this world. He would extend it throughout the universe! “The ubiquity of the concept of intelligence is clearly seen in discussions of the most culturally different beings one could well imagine—extraterrestrial life in the universe…. Can one easily imagine ‘intelligent’ beings for whom there is no g, or whose g is qualitatively rather than quantitatively different from g as we know it?” With such devotion to quantified, unilinear ranking of intelligence through the universe, it is not surprising that Jensen, in analyzing mere mortals of a single species, would resurrect the Spearman-Burt argument and elevate it from one model among many to necessary reality.

Of course, Jensen is not unaware of Thurstone’s critique, and he does discuss it at length. His defense of g as dominant and ineluctable arises from a general observation about Thurstone’s rotated simple structure axes. In Figure 4, note that the simple structure axes do not lie within the clusters, but outside them. This occurs because the clusters themselves are positively correlated (separated by an angle of less than 90 degrees) while the simple structure axes are defined as mathematically uncorrelated (separated by 90 degrees). Thus, the axes miss the clusters, though they lie close to them. Thurstone recognized that clusters might be better defined if axes passed right through them, as in Figure 5. In this so-called “oblique” system of simple structure, the axes themselves are now positively correlated, and this correlation does record a kind of second-order g, as Thurstone admitted.

Thus, Jensen argues, g must be real. It can’t be avoided because it appears both directly in principal components and indirectly as the cause of positive correlation among oblique simple structure axes. Yet Jensen has missed or ignored Thurstone’s repeated claim about this indirect form of g: it is generally a weak, secondary effect accounting for a small percentage of total variance among all tests. Jensen’s argument requires not merely that g exist, but that it be, quantitatively, the major source of variance. He writes: “We are forced to infer that g is of considerable importance in ‘real life’ by the fact that g constitutes the largest component of total variance in all standard tests of intelligence or IQ.”

In sum, there remains a fundamental difference betwen g as the first principal component (Spearman-Burt) and g as a second-order correlation of oblique simple structure axes (Thurstone): g is usually dominant in the first and very weak in the second—while Jensen’s argument requires that it be dominant.⁸ Since principal components and oblique simple structure represent two equally valid methods of factor analysis, we are forced to conclude that the dominant g required by Jensen (which appears only in principal components) is not a fact of nature, but an artifact of choice in methods.

Behind this technicality lies an even deeper error in the identification of g with a single quantity defined as that elusive property of “intelligence.” Spearman’s g is a measure abstracted from correlation coefficients, and the oldest truism in statistics states that correlation does not imply cause (consider the perfect positive correlation between my age and the price of gasoline during the last five years). Even if dominant g were an ineluctable abstraction from the correlation matrix, it still wouldn’t tell us why mental tests tend to be positively correlated. The reason might be largely innate as Jensen assumes and requires—people who do well on one test generally do well on others because they have more inborn intelligence. Or it might be largely or totally environmental—people who do well on one test generally do well on others because they had a good education, enough to eat, intellectual stimulation at home, and so forth. The environmental interpretation undermines Jensen’s repeated claim (in its vernacular meaning) that whites are more intelligent than blacks as measured by performance on unbiased tests—yet this interpretation is fully consistent with Spearman’s g. Jensen’s argument suffers a double defeat: Spearman’s g is only one of several valid ways to represent data, not an ineluctable entity. Even if it were an entity, it could not be identified with any innate property meriting the name “intelligence.”

We can conclude: (1) Spearman’s g is at best one of several ways to summarize sets of data on correlations between mental tests (at worst it is an artifact of method). In any case, it cannot be viewed as an ineluctable entity because other equally valid techniques either do not find g in the same data or find it in quantities too small to matter. (2) Even if g be admitted in Spearman’s original form, its basis cannot be specified from psychometric data. Possibilities range across the entire spectrum from pure environmental advantage or disadvantage to the inborn difference in amount that Jensen and other hereditarians require.

IV. Rigor in Numbers and Argument

Jensen’s prior preference for linear ranking according to a single, largely innate quantity called “general intelligence” not only leads him to invalid or irrelevant claims (the meaning of bias and the interpretation of factor analysis); it also skews his interpretation of numerous facts, liberally studded throughout, the book, that common sense would read differently. Jensen, for example, enthusiastically reports a correlation coefficient of brain size and IQ of about 0.3. He doesn’t doubt that this correlation records natural selection operating for greater intelligence through larger brains. He regards it as remarkable that the correlation is as high as it is since “much of the brain is devoted to noncognitive functions.”

Yet at the bottom of the very same page, he records a correlation of equal strength (average of 0.25) between IQ and body stature. This, he doesn’t doubt, “almost certainly involves no causal or functional relationship.” Jensen is so attached to his preferred scheme of argument that the obvious interpretation of these facts has not occurred to him—that the weak correlation of IQ and height reflects environmental (largely nutritional) advantages favorable to both and that the correlation of IQ and brain size is a non-causal, indirect consequence of it, since big people have bigger body parts, including brains, arms, and legs (except that no one has ever thought of computing a correlation of leg length with brain size. Choice of question is, indeed, a function of expectation).

Other facts, proving (it would seem to me) that environment exerts a powerful influence upon average IQ within groups, are either glossed over or reported in other contexts. I found nothing in Jensen’s book more striking than a chart on page 569 showing that children tested in 1972 on the Stanford-Binet and scored according to the 1937 norms have an average IQ of 106.3—a general gain of more than one third of a standard deviation from the standardized 1937 mean of 100. Within some age classes, the average gain is considerably higher—10.8 points at age 3 1/2, for example (the “Sesame Street” effect, perhaps).

This general gain can hardly be ascribed to genetic causes; it reflects whatever improved literacy, earlier access to information through radio and television, better nutrition, and so forth have wrought in just thirty-five years. When we recognize that the average black-white difference is 15 points, and that gains of up to two thirds this amount have occurred in certain age groups as a result of general changes in environment not specifically directed towards this end, then why should we be ready to conclude that group differences are ineluctable? I know no fact that so clearly underscores the pernicious nonsense behind Mr. Shockley’s sperm bank and points to the efficacy of improved standards of living for increasing the so-called “general intelligence” of Americans.

Jensen attempts to cover all these difficulties with the classical ploy of hereditarians: I have the numbers, the rigor, and the objectivity; you have only hopes and emotion. He refers to criticism of testing as “largely emotional, ad hoc, often self-contradictory”; they “convey attitudes and sentiments instead of information.” He depicts his own work instead as an “exhaustive review of the empirical research.” The ploy has worked so far. Jensen’s 800 pages of numbers have benumbed reporters who have not found the basic weaknesses of argument and who have assumed, on the more-is-better fallacy, that such length must reflect profundity.

But an argument is only as good as its premises and logic. Jensen may contemptuously dismiss criticisms as “armchair analysis,” but the copious citation of numbers cannot salvage an argument grounded on invalid premises. Lord Kelvin proved with numbers more rigorous than any psychometrician has ever derived that the earth could not be more than a few million years old—not enough time for Darwinian evolution. But the numbers rested on a false assumption that heat emanating from the earth’s interior reflected the cooling of an initially molten planet, while we know now that it arises largely through the decay of radioactive elements.

The computer people have a word for it, one of their most euphonious acronyms—GIGO, or garbage in, garbage out. Jensen’s problem is not garbage, but irrelevancy (400 or so pages on bias), and fallacious premises (the equation of Spearman’s g with intelligence).

Numbers have undoubted powers to beguile and benumb, but critics must probe behind them to the character of arguments and the biases that motivate them. Léonce Manouvrier, the leading statistical anthropologist of the late nineteenth century, made this point with feeling when he disproved Paul Broca’s claim that the smaller brains of women reflected inferior intelligence:

Women displayed their talents and diplomas. They also invoked philosophical authorities. But they were opposed by numbers unknown to Condorcet or to John Stuart Mill. These numbers fell upon poor women like a sledge hammer, and they were accompanied by commentaries and sarcasms more ferocious than the most misogynist imprecations of certain church fathers. The theologians had asked if women had a soul. Several centuries later, some scientists were ready to refuse them a human intelligence.⁹

Letters:

Nathan P. Glazer

Jensen and Bias: An Exchange

October 23, 1980

Stephen Jay Gould

Stephen Jay Gould (1941–2002) was an American geologist, biologist and historian of science. He taught at Harvard, where he was named Alexander Agassiz Professor of Zoology, and at NYU. His last book was Punctuated Equilibrium.

This Issue

May 1, 1980

1
A.R. Jensen, 1969, “How much can we boost IQ and scholastic achievement?” Harvard Educational Review, 39:1-123. ↩
2
L.S. Hearnshaw, Cyril Burt, Psychologist (Cornell University Press, 1979). ↩
3
L.J. Kamin, The Science and Politics of IQ (John Wiley, 1974); see also articles by R. C. Lewontin, J. Hirsch, and D. Layzer in N.J. Block and G. Dworkin (eds.), The IQ Controversy (Pantheon, 1976). ↩
4
Time, September 24, 1979, p. 49, and Newsweek, January 14, 1980, p. 59. ↩
5
C. Spearman, “General intelligence objectively defined and measured,” American Journal of Psychology, 1904, 15:201-293. ↩
6
Technically, a first principal component is not the same thing as the first axis of a factor analysis done in the principal components orientation. In my diagram, I work in two dimensions, fit two axes and resolve all the information. This is called principal components analysis. In true factor analysis, one decides beforehand to abandon some information and to work in a space of reduced dimensionality. But the first principal component and the first factor axis in principal components orientation play the same conceptual role and differ only in mode of calculation: they are “best fit” axes that resolve more information in a set of vectors than any other axis could. ↩
7
L.L. Thurstone, The Vectors of Mind (University of Chicago Press, 1935). ↩
8
I think that Jensen senses his difficulty because, in one chart (p. 220) he gives the projection of each test both upon g as the first principal component and upon a set of simple structure axes distributing this g among them. Thus, he uses the same information twice and explains more than 100 percent of his information. Since big g‘s appear in the same chart with large loadings on simple structure axes, one might be falsely led to infer that g remains large even in simple structure solutions. ↩
9
L. Manouvrier, “Conclusions générales sur l’anthropologie des sexes et application sociales,” Revue de l’école d’anthropologie, 1903, 13:405-423. ↩