• Email
  • Single Page
  • Print

Jensen’s Last Stand

Bias in Mental Testing

by Arthur R. Jensen
The Free Press, 786 pp., $29.95

I. Jensen Then and Now

There are many styles of retreat in the face of failure. As a first and most forthright strategy, one can simply be humble and contrite. Clarence Darrow once stated that if God really existed after all, and if he were, following his death, arraigned before God as judge with the twelves apostles in the jury box, he would simply step up to the bench, bow low, and say: “Gentlemen, I was wrong.” In a second, intermediate strategy—the stiff upper lip—one looks upon the bright side (or sliver) of admitted adversity. When Robert FitzRoy, Darwin’s captain on the Beagle, learned that Jemmy Button, the Fuegian native he had trained in English ways, had “reverted” completely to old habits within months of his return, FitzRoy took refuge in the thought that “a ship-wrecked seaman may hereafter receive help and kind treatment from Jemmy Button’s children; prompted, as they can hardly fail to be, by the traditions they will have heard of men of other lands.” As a third tactic, one proclaims triumph and punts hard. I remember Senator Aiken’s brilliant solution to the morass of Vietnam—that we should simply declare victory and get out.

Arthur Jensen has published an 800-page manifesto embodying this third strategy. To understand why it represents a retreat—and a failed retreat at that—we must review the history of its genesis. In his notorious article of 1969,1 the founding document of “Jensenism” as a public issue, Jensen maintained that compensatory education must fail because the black children that it attempted to aid were, on average, genetically inferior to white children in intelligence. He based his claim on a strong form of genetic argument: the heritability of IQ, he maintained, had been adequately estimated at about 0.8 among whites; therefore, the 15 point average difference in IQ scores between blacks and whites must be largely innate in origin.

The intervening decade between this article and the present book has not been kind to Arthur Jensen. First of all, the estimate of heritability, depending so heavily on Sir Cyril Burt’s faked data,2 is clearly too high. Second, and more important, the value of heritability within either the white or the black population carries no implication whatever about the causes for different average values of IQ between the two populations. (A group of very short people may have heritabilities for height well above 0.9, but still owe their relative stature entirely to poor nutrition.) Within and between group variation are entirely different phenomena; this is a lesson taught early in any basic genetics course. Jensen’s conflation of these two concepts marked his fundamental error.

I assume that Jensen now understands where he went seriously astray. The present book bypasses the issue of heritability entirely. In dismissing this previous bulwark of his system in just two paragraphs, Jensen simply states that the matter is too complicated for treatment here, though he relishes some of the most arcane and complex issues of psychometrics throughout his 800 pages. Heritability, he argues, “is a highly technical and complex affair involving the principles and methods of quantitative genetics.” “Because even an elementary explication of heritability analysis is beyond the scope of this book, the interested reader must be referred elsewhere.” His list of appended references includes not a single one of several cogent critiques directed against his original thesis.3

Moreover, Jensen now claims that there really isn’t any need to talk about genetics anyway: “Because we have no estimate of the individual’s genotype that is independent of the test score, there is really no point in estimating genotypic values.” In fact, he now virtually argues that the subject of causation should be dropped entirely: “The constructors, publishers, and users of tests are under no obligation to explain the causes of the statistical differences in test scores between various sub-populations. They can remain agnostic on that issue.” Yet his acceptance of this very obligation is the motivating theme of his 1969 article.

Am I not being unkind in bringing all this up? A man should be allowed to change his mind with grace, and to save face in his expiation. But Arthur Jensen hasn’t altered his basic tune at all. He is simply using a different, and more indirect, argument to prop it up. And he has buried the central fallacies of that argument so deeply among the apparent rigor of these 800 pages of lists, figures, and charts that no commentator in the mass media has yet ferreted them out.4 Jensen’s fundamental claim is still about innateness. Indeed, it is still the same claim: blacks are less intelligent than whites and this difference cannot be attributed to environment.

In reasserting his 1969 claim in its more indirect form, Jensen constructs an argument of three parts:

  1. The average difference in IQ between whites and blacks is about 15 points, or one standard deviation. Other tests of intelligence show comparable differences.

  2. The tests are unbiased.

  3. IQ (and other valid tests of the same mental attributes) measure something that we can legitimately call “intelligence.”

Note that although the argument says nothing about genetics or innateness, it seems to lead inexorably in that direction. After all, if blacks perform more poorly than whites on unbiased tests that measure intelligence, then blacks must be less intelligent than whites for reasons unrelated to environmental deprivation (our usual supposition for the cause of “bias,” as we use the term in the vernacular). What reasons besides innateness are left?

Of the three arguments, only the first, an undisputed fact, compels any assent in our usual, vernacular interpretation. It also leads nowhere because it implies nothing whatever about the reasons for the difference. The tests themselves may record nothing of interest; potential reasons for difference span the entire range from pure environmental imposition to pure innateness. Jensen’s argument becomes meaningful and controversial only if the second and third points are valid in our vernacular understanding. I believe that they are not valid and that this book, despite its wealth of interesting technical detail for psychometricians, therefore contains no important or general message to enlighten the general concern that we all must feel for the issue of human potential.

II. The Meanings of Bias

Jensen has titled his book Bias in Mental Testing, and most of its ample length is dedicated to proving either that there isn’t any, or that it can be recognized and corrected if there is. And I’m sure that he’s right.

The last paragraph may, at first glance, seem utterly destructive to my case, but it isn’t. Things are seldom what they seem in statistics, and a layman’s understanding of the field has been plagued by important differences between vernacular and technical meanings of terms. “Significance” and “discrimination” may provide the two most notable cases of difference between English vernacular and statistician’s jargon, but “bias” belongs in the same category. In proving that tests are not biased, Jensen speaks to statisticians’ interests and not at all to what the public understands by the common charge that IQ tests are biased.

Average black IQ in America is about 85, average white IQ about 100. The charge of bias, in our ordinary understanding of the word, holds that this poorer performance of blacks is a result of environmental deprivation relative to whites, and that it does not reflect inherent ability. The vernacular charge of bias (I shall call it V-bias) is linked to the idea of fairness and maintains that blacks have received a poor shake for reasons of education and upbringing, rather than of nature.

For two months, I have tabulated every use of the word I have seen in the popular press, and all have conformed to this understanding. The New York Times, for example, reported that Federal Judge Robert Carter “has ruled that the examination [for police officers] was biased” because so few black and Hispanic applicants scored among the highest grades. And The Sacramento Bee outlined Judge Robert F. Peckham’s decision to ban IQ tests as a criterion for placing children in EMR (educable mentally retarded) classes in California. Peckham was disturbed by the preponderance of blacks in such classes and ruled “that there probably could not be any substantial disproportion of blacks…if the process of selection was unbiased.” In other words, both assumed that the rarity of high scores for blacks or low scores for whites does not reflect natural aptitude fairly. If Jensen had proved that tests are unbiased in this sense (V-bias), he would have made an important and deeply troubling point.

But “bias,” to a psychometrician, has an utterly different and much narrower meaning—and Jensen addresses himself only to this technical sense (which I shall call S, or statistical bias). An intelligence test is S-biased in assessing two groups only in the following circumstance: Suppose that we plot the scores on an intelligence test showing them in relation to what we wish to predict from the test—job performance or school grades, for example. The test is unbiased in a statistician’s sense if and only if points for blacks and whites fall along the same line—that is, if blacks and whites, plotted separately, differ neither in slope, “y-intercept” (the point of intersection between the line for whites or blacks and the vertical axis) nor standard error. If this seems confusing, consider Figure 1, an example of intercept bias. Whites and blacks have the same slope, but whites have a higher y-intercept.

It is not difficult to see why psychometricians want to rid themselves of S-bias; for in an S-biased test, the same score yields different predictions based upon group membership. In Figure 1, an IQ of 100 predicts poorer grades for a black than for a white. No sensible tester wants to construct an instrument in which the same score means different things for different kinds of people.

Jensen devotes most of his book to showing that S-bias does not affect mental tests (or that it can be corrected when it does exist). Yet I found nothing surprising in his densely documented demonstration that tests are unbiased in this sense. It would be a poor reflection indeed on the technical competence of psychometrics if, after nearly a century of effort, they had found no way to eliminate such an elementary and undesirable effect.

Thus, in saying that the tests are unbiased, Jensen has only managed to show that the lower black and higher white mean scores lie on the same line (see Figure 2). And this unsurprising demonstration says nothing at all about the vernacular charge of bias. Does the lower black mean reflect environmental deprivation rather than inherent ability (V-bias)?

Of course, Jensen admits this. He distinguishes his notion of bias (S-bias) from our vernacular idea of fairness to all cultures. He also admits that such fairness cannot be defined objectively and thus undermines his own larger case: “One can determine with objective statistical precision how and to what degree a test is biased with respect to members of particular subpopulations. But no such objective determination can be made of the degree of culture-loadedness of a test. That attribute remains a subjective and, hence, fallible judgment…. The term ‘bias’ is to be kept distinct from the concept of fairness-unfairness.”

  1. 1

    A.R. Jensen, 1969, “How much can we boost IQ and scholastic achievement?” Harvard Educational Review, 39:1-123.

  2. 2

    L.S. Hearnshaw, Cyril Burt, Psychologist (Cornell University Press, 1979).

  3. 3

    L.J. Kamin, The Science and Politics of IQ (John Wiley, 1974); see also articles by R. C. Lewontin, J. Hirsch, and D. Layzer in N.J. Block and G. Dworkin (eds.), The IQ Controversy (Pantheon, 1976).

  4. 4

    Time, September 24, 1979, p. 49, and Newsweek, January 14, 1980, p. 59.

  • Email
  • Single Page
  • Print