Who Will Digitize the World's Books? | Robert Darnton

Submit a letter:

In response to:

The Library in the New Age from the June 12, 2008 issue

To the Editors:

In his recent essay, “The Library in the New Age” [NYR, June 12], Robert Darnton extols the value of Google’s project to digitize the collections of major research libraries. As he puts it, it is a way to make “all book learning available to all people.” While there is much truth in this statement, there are some important considerations about the Google project that should be raised.

From the user’s perspective, the possibility of using the Internet to access a book, particularly a hard-to-find book, from one of the large libraries of the world is obviously wonderful. However, it is important to clarify what Google is offering: it is not a digital text that the library will be able to share unconditionally with others. In its contracts with the nineteen libraries now in its consortium, Google has stipulated that the “Universal Digital Copy” of digitized books it provides must be protected from non-Google Web software; and that the number of downloads from texts digitized by Google will be limited. Only Google can aggregate collections of different libraries in order to create the larger digital database that is the most valuable part of the consortium project.

Put another way, Google has strictly limited the “computational potential” of digitized books, that is, the possibility of their being used for various kinds of digital analysis. The importance of the “computational potential” of digital documents has been recognized only recently and it is not always well understood. Currently, one of the familiar advantages of digitized texts is that it is easy to construct indexes from them; but researchers are now exploring many other possibilities, such as “text mining” by which facts, patterns, and semantic elements of texts can be identified, analyzed, and assembled in new ways. New knowledge can be extracted more easily from large sets of texts. It may also lead to better machine translation. With Google’s digitization, for example, it is possible to conduct advanced text mining within a single library’s collection; but only Google can provide access through its own Web site to the entire pool of scanned books in the nineteen libraries with which it now has contracts.

In doing so, Google is seeking to protect itself from other competitors. But it also seriously restricts what a library can do with its digital collection, including the extent to which it is able to share digitized texts with other libraries. To those who raise this point, Google has emphasized that its contracts with libraries are nonexclusive. In other words, the libraries are free to redigitize their collections on their own or with another partner, and on more liberal terms. However, what library, having had its collection digitized by Google, would find the means, financial and technical, to redo the job in a more satisfactory manner?

It appears that Google is striving to become the main dispenser of algorithmic power over digital books. By monopolizing much of the computational potential of such books, Google is positioning itself as the operating system of the digital document world. Digital texts already dominate some areas of knowledge. To give a single company such a grip on the collective memory of the world, its analysis, and even its meaning is frightening to say the least.

Dozens of libraries have understood the danger of the Google Book maneuver and have joined the Open Content Alliance (OCA). They include the British Library, the National Library of Australia, the Boston Library Consortium, Columbia University, the University of Toronto, the University of Chicago, Johns Hopkins, and the University of California libraries, to name only a few. Like Google Books and unlike most other digitization projects that operate on a much smaller scale, the OCA seeks to promote large-scale digitization, but it does so without putting shackles on the participating libraries. Alas, the OCA has nothing like Google’s deep pockets, and the recent withdrawal of Microsoft from the alliance makes the OCA’s position even more difficult.

But there may be some hope in this situation. Since many different groups have an interest in the free availability of digital texts, the process of digitization itself could be distributed among a wide variety of libraries and other independent groups, much in the way of contributions to Wikipedia and Project Gutenberg. Digitization clubs could emerge not only in public libraries but in schools and museums. In short, mass digitization projects should be designed in ways that are not dependent on market-based corporations or on government subsidies, but can nevertheless profit from forms of support from either kind of institution.

Libraries can have a very important part in promoting these projects and enforcing the standards that must accompany them. In so doing, they would be acting as institutional citizens of the digital document age, and not as grateful (and somewhat passive) consumers of Google’s apparent largesse.

Jean-Claude Guédon

Professor of Comparative Literature

University of Montreal

Montreal, Canada

To the Editors:

I have read Robert Darnton’s “The Library in the New Age” with great pleasure and interest, but would like to make one point on a minor detail. Emphasizing that movable type did not take off in East Asia in the same way it did in Europe, the article inadvertently misrepresents the importance of books printed with wooden blocks in China, Korea, and Japan, where this form of printing was continually and widely used up to the end of the nineteenth century.

No less than European movable type, woodblock printing was capable of producing vast print runs that could serve a very large and growing reading public with modestly priced publications. In fact, the Princeton librarian Martin Heijdra has calculated that movable type as it was used in Asia was less economical than woodblock printing unless print runs were quite small. The very cheap mass-market novels Maurice Courant saw hawkers sell on the streets of Seoul in the 1890s were printed with wooden blocks (whereas palace ladies would read novels that were handwritten in beautiful calligraphy, on the finest paper).

Woodblock printing had the added advantage that it also lent itself to printing on demand. An eighteenth-century Korean gentleman who wanted a particular book would consult a printed catalog that stated where the blocks for that title were kept (in special, well-aired storage facilities in local government offices, private academies, temples, etc.), and send a servant to have a single copy made. This meant that these titles never ran out of stock.

The actual printing did not require an elaborate printing press. For every two pages a separate block was carved. The block was then inked and covered with a piece of paper, which was gently pressed down, section by section. Once such a double page was printed, it was folded in the middle, and sewn together with the other pages at the open end to make a book. Thus there was no printing on the back of the paper; recto and verso were created by folding the sheet. All this had to be done with care, but did not require great skill, which facilitated the printing of books.

For movable type, the procedure was the same. The text would be set in a tray, the characters would be inked, and an impression produced in exactly the same manner. In the circumstances this was an efficient way of printing, which would not cost more than the price of the paper and the time a servant needed to do the job. This changed when mechanized Western printing presses were introduced to serve an increasingly commercial market.

Boudewijn Walraven

Professor of Korean Studies

Leiden University

Leiden, the Netherlands

Robert Darnton replies:

I am happy to be reminded about the importance of woodblock printing in early modern East Asia and of the Open Content Alliance in the digital world around us everywhere today.

Woodblock printing also flourished in the West, especially in the production of popular broadsides, which combined images and text. Once carvers perfected the technique of cutting at right angles to the grain instead of parallel with it, the blocks proved to be much tougher than metal type. But Gutenberg’s invention made it far more efficient to compose pages from type arranged in the compartments of a case. And of course, this procedure could be adapted more easily to Western languages in which the printer’s alphabet usually contained only twenty-three letters than in East Asian languages with thousands of ideographs.

I share Jean-Claude Guédon’s worry about the danger of one company monopolizing the “computational potential” of digitized texts, and I agree that the Open Content Alliance is a good thing. But is it an adequate alternative to Google? Grassroots digitizing may help a thousand flowers bloom. Our Open Collections Program at Harvard—a project for digitizing public-domain material from special collections on topics such as immigration and women at work—has already made hundreds of thousands of pages freely available on-line. But we need to search, mine, and study millions of volumes from all the collections of our research libraries.

Libraries have accumulated those volumes at great cost over many generations, but they have kept most of them within their walls. Digital technology now makes it possible for this common intellectual heritage to come within the range of the common man and woman. Yet corporate interests, flawed copyright laws, unfair restrictions on fair use, and many other obstacles block the public’s access to this public good. By removing those obstacles, the United States Congress can clear the way for a new phase in the democratization of knowledge. For my part, I think congressional action is required to align the digital landscape with the public good.

This Issue

August 14, 2008

Hugh Eakin

The Devastation of Iraq’s Past

Janet Malcolm

Burdock

Zadie Smith

E.M. Forster, Middle Manager

All Contents