• Email
  • Print

The IBM Shakespeare

In the old days only the Bible rated a concordance, for in it alone every word counted; as late as Johnson’s Dictionary the word is defined as meaning “a book which shows in how many texts of Scripture any word occurs.” There were numerous biblical concordances, but the one everybody knows about is Cruden’s. Alexander Cruden, a contemporary of Johnson’s, known as Alexander the Corrector from his trade of proofreader, published in 1737 the work from which he expected fame. There were signs that even before he started the book he was not quite right in the head, but disappointment at its failure to make his fortune drove him madder, so that in later life he became a rather well-liked, serious, and useful nut about town. A poor return for making so big a concordance by hand is surely enough to drive anybody out of his mind.

The example of Cruden, however, proved no deterrent, and soon Shakespeare, promoted into the sacred book class, was ripe for a concordance, which duly issued from the hands of Mrs. Cowden Clarke in 1845. This first attempt was replaced in 1894 by Bartlett’s New and Complete Concordance, which remains, though its time is running out, the standard work. The reason why its time is running out is that a concordance no longer needs to be hand-stitched; a computer will fabricate it, not only sorting out the words but even printing the pages. One important use is in descriptive linguistics, but there are several literary computer-concordances around already, of Stevens and Yeats, for example.

Now Professor Spevack of the University of Münster takes on Shakespeare with an IBM 7094, which, if the whole thing weren’t so scientifically chilly, one would like to call a Cruden. The work, which will be completed next year, is intended less for pious exercises than to increase “the possibilities for research and interpretation.” It will contain a matter of 16,000 columns. This first volume deals with the comedies that appear in the First Folio of 1623 (that is, all the comedies except Pericles and The Two Noble Kinsmen, which will be tucked in with the tragedies in Volume III; so, of course will Cymbeline and Troilus and Cressida, comedies which were printed with the tragedies in 1623). The second volume will have the history plans and the non-dramatic works; the third will be as described. (No part of Sir Thomas More, it seems, will be included.) Each of these first three volumes contains a separate concordance to each play, and then another for every character in each play. Volumes IV-VI are to be a general concordance of the whole oeuvre.

Computers work faster even than Cruden, though he did his in a year; and they stay sane. But of course you must tell your IBM what you want; it will do that, within minor limitations, exactly, but it will do no more. The compiler must have definite notions about how the concordance is going to be used. What are Mr. Spevack’s?

Not, in this volume, the conventional notions. The words, though they are of course given line-references, are not provided with contexts. A computer can center the key word and print its context out to two justified margins; or it can, as in the new Cornell Blake, give the necessary context more grammatically; but nothing of the kind is done here. Perhaps the last three volumes, in which the publicity promises us a “meaningful context,” will remedy this. Certainly the principal use of a concordance, however the possibilities for research might be increased by other data, is to do what Cruden and Bartlett did—let you see all the contexts in which a word is used. This volume only gives you references, and there is the added difficulty that the computer cannot distinguish homographs1 ; for instance, a word such as can has to be marked with an asterisk to warn you that out of the, say, thirty uses listed an unstated number are verbal, as against substantival, and that you’ll have to look them all up to see which is which. Of course it would have grossly inflated this already large book if the play concordances had set out the context of every word.

Mr. Spevack really does include every word. The first entry in The Tempest is, not surprisingly, a*. It occurs 289 times in the play. Computers can be selective, I’m told, on what are known as ONLY and ALLBUT programmings. But Mr. Spevack has not required this of his Cruden. He has accepted everything and then asked the computer certain questions about the 289 a‘s. We learn that 196 of them occur in verse 93 in prose; that of the 16,036 words in the play a accounts for a proportion of 1.802 percent, as against the‘s 2.725 percent. Wondering how I could make use of this fact, I got out a/the figures for some other plays: Two Gentlemen of Verona (1.865 percent, 2.256 percent); Merry Wives (2.069 percent, 2.694 percent); Measure for Measure (1.880 percent, 3.046 percent); Comedy of Errors (1.642 percent, 2.944 percent); Love’s Labour’s Lost (2.244 percent, 3.913 percent), and that will do. Some of these variations are mildly surprising, I think—why should the early Love’s Labour’s Lost have so many more articles, both definite and indefinite, than the early Comedy of Errors, and something like twenty-five more the‘s than any other comedy? Nothing in these figures, or the others I won’t trouble you with, suggests that it has anything to do with the verse-prose proportions of the plays, or with anything else that I can see. Of course Mr. Spevack is providing material usefully treated, not giving answers.

Each play concordance leads off with a table of the number of speeches, lines, and words in the play, saying how many of these speeches, lines and words are in verse, prose, or split between them; and how many different words are used in each play. This last sounds interesting, so I’ll give the figures in what will pass for the chronological order of the plays:

Comedy of Errors 2522

Taming of the Shrew 3240

Two Gentlemen 2718

L. L. L. 3772

Midsummer N. D. 2984

Merchant of V. 3267

Merry Wives 3267

Much Ado 2954

As You Like It 3248

Twelfth Night 3096

All’s Well 3513

Winter’s Tale 3913

Tempest 3149

There are plenty of puzzles here. Obviously the longer the play the more fresh words you tend to use, but that doesn’t explain everything. Nor does date (L. L. L. is early). Theme counts for a lot, clearly; much of L. L. L. is about language, especially affected language, with examples. Two Gentlemen is, in the verse parts, conventionally pretty and deliberately limited. But one feels the whole thing is very mixed up, and explanations would be welcome.

Mr. Spevack has a special interest in relative frequencies. Of course grossly high frequencies of occurrence (world in Antony and Cleopatra, and lots of others) are sufficiently evident to the close reader already, and are often discussed. But this Concordance will make it easy to spot anomalous frequencies of less impressive words—of that or but or by. Will this help in the study of attributions, the kind of thing they did on the Pauline epistles by a computer-analysis of Greek particles? The most learned advice I can come by is that Mr. Spevack would have set it up differently if that had been his intention, since one needs complex grammatical data as well. I did a sample (C. of E., M. for M., W.T.—a good chronological spread) on that, but, and by, and can report that that appears to decline steadily, and so does by. I don’t know why. Anyway, the comedies, except for Pericles and Two Noble Kinsmen, contain no play now seriously thought to contain a lot of non-Shakespearian matter, so the disintegrators will have to wait until the next volume offers statistics on Henry VI and Henry VIII.

Mr. Spevack in fact tells us little about his intentions, but promises to say more about them later. Negatively, he has tried to avoid “editorial” interference and to keep the data “pure.” He states that “all the words are indexed in exactly the form in which they appear in the text of Shakespeare,” though departures from copy-text are marked with a slash. This is slightly baffling. In the first place the words aren’t old-spelling; in the second, Mr. Spevack gives departures from the copy-text without giving the copy-text itself. Take the famous Tempest crux—“Most busie lest when I doe it.” Neither busy nor lest (or least) gets into the concordance; what you will find is busilest, with the appropriate slash. I happen to like busilest, and read it in my own edition, but it is not for that reason, nor indeed by his own preference, that Mr. Spevack has let it oust the text of 1623. It is busilest because Professor G. Blakemore Evans says so in his New Riverside Edition, still not published. Similarly, the reason why you won’t find presence in the Tempest concordance (though I am sure it is the right reading for present in I.i. 22 or thereabouts) is that Mr. Evans is not so persuaded.

I doubt if Mr. Spevack could have done better than to follow Mr. Evans’s authoritative new text, though it means that the concordance is pretty well unusable until that text appears, if only because the line-references, which are all you get, mean nothing. Thus, there are other snags, some unavoidable, some not. Most busie lest is a venerable crux, but an isolated disturbance in a goodish text. What will happen when there is radical disagreement still about the relative authority of Quarto and Folio, as in Othello? On the present plan we shall get what Professor Evans—and who better if there must be a single master—accepts, though this may be what other competent editors, weighing the evidence differently, reject. There is surely a case for giving both or all rival readings, even, perhaps, an odd emendation. Will no one, in the Henry V concordance, look for babble?

Another difficulty arising from this adherence to one edition concerns prose-verse proportions, about which Mr. Spevack is so meticulous that every word in the concordance is marked as occurring in P(rose) or not. First, the New Riverside is a double-column edition, which makes the prose count artificially high2 . This can be offset. What is less tractable is the difficulty that sometimes occurs as to whether a passage is in prose or verse. This crops up in Tempest, III. ii. For various reasons, compositors weren’t wholly dependable about verse and prose, and when you find this: “How does thy honor? Let me lick thy shoe: I’ll not serve him, he is not valiant,” from Caliban, who often does break into verse, you might agree with Johnson and call it two verse lines. Of course there are perfectly good reasons for not tampering with it—prose can take a strong iambic beat, especially in certain situations or characters; and Mr. Evans, I deduce, keeps this as prose (presumably three lines). So it is worth remembering that some of the proportions would be substantially altered if the concordance were based on a different edition. Mr. Spevack urges us, prompted by his slashes, to look up difficult or dubious points in Mr. Evans’s textual notes, and it is necessary advice.

The character concordance is another new idea. Again the principle is to have everybody in, including multiple First Gentlemen, since you can’t always be sure the same Gentleman is the first in two separate scenes. The value of this apparently finicky work is to facilitate vocabulary tests on individual characters, though hardly as between First Gentleman (i) and First Gentleman (ii). You can, for example, look up use-usurer-interest in Merchant of Venice and Shylock 2,1,3,1. Gentle is used fourteen times—apart from gentleman/men 3, gentleness 1, gentry 1—and gentile not at all, though part of the point is the interchangeability of these words; Shylock uses absolutely none of them. On the other hand Jew and its derivatives, plus Hebrew, add up to 73, of which Shylock has 10.

These statistics certainly have their interest, though some might feel skeptical about their utility. In All’s Well, noble and its compounds occur twenty-five times, though the highest personal score (Bertram’s) is 5. That mysterious figure “A Gentleman Astringer” may or may not seem less mysterious if you know he has eight speeches, all in verse, twenty-one lines, all in verse, four split lines, 125 words, all in verse, and uses eighty-eight different words. He accounts for .854 percent of the speeches, .696 percent of the lines, and .554 percent of the words in the play. Autolycus in The Winter’s Tale has 2357 words, 329 of them in verse: uses 852 different words, and accounts for 8.847 percent of the speeches, 8.811 percent of the lines, and 9.725 percent of the words in the play. Berowne’s fat part in Love’s Labour’s Lost accounts for 25.238 percent of the speeches, 22.092 percent of the lines, 22.864 percent of the words. He speaks 4809 words, 1444 of them different, and 4198 of them in verse. Armado’s much smaller part (1736 words) uses 677 different ones, and so has a higher ratio of different-word-count to total-word-count, though this probably illustrates only some law of diminishing returns applying to lexical novelty.

Obviously it will be easier to comment on the utility of this work when it is wholly available, and when its indispensable companion, The New Riverside Edition, has appeared. Nevertheless there can be little doubt that it will replace existing concordances, and that volumes IV-VI in particular will be the modern Bartlett. We can’t be sure as yet what IBM 7094 and the other resources, not defined, of the German Computer Center at Darmstadt, will be asked to provide by way of additional information in the more conventional volumes, but in spite of one or two alphabetical aberrations they can’t control (dar’d must precede dare) there’s no doubt that the modern Bartlett will be compiled, justified, analyzed, and printed by a clean machine much better than a mad human could do it; and that we shall learn, inter alia, what proportion of the poet’s total vocabulary is claimed by such words as love, black, power, and will. But his favorite word will, I predict, turn out to be the.

  1. 1

    There is to be a list of 700 homographs in Volume VI.

  2. 2

    Only, of course, in the computation of lines; on speeches and words it makes no difference. I have checked the Spevack frequencies against those given in Brian Vicker’s The Artistry of Shakespeare’s Prose (Barnes and Noble, 452 pp., $12.00) and find some minor discrepancies; Vickers uses the Old Cambridge single-column lines, I’ve no doubt, by the way, that Mr. Vickers would have been extremely glad to have the concordance on his desk while writing this big and intelligent book; probably a more advanced approach to stylistics than his will need it even more.

  • Email
  • Print