Google & the Future of Books: An Exchange

To the Editors:

In his recent article criticizing the Google settlement [“Google and the New Digital Future,” NYR, December 17, 2009], Robert Darnton fails to acknowledge the significant role that libraries have had in the creation of Google Book Search as well as the concrete steps they are taking to address the sorts of concerns he raises. Libraries are using Google-digitized volumes to create the “truly public library” that he seeks, and these same libraries are taking responsibility for the preservation of Google-digitized volumes.

More than thirty research libraries have made agreements with Google to digitize their collections as part of their long-standing tradition of providing the highest level of access to scholarly materials. These libraries have worked successfully with Google to ensure the integrity of their physical collections and to digitize those collections in accordance with broadly held standards for digital capture.

Many of these institutions have also shaped a coordinated strategy for preserving and providing access to their growing digital collections. In 2008 a group of twenty-five research libraries including the institutions of the Big Ten and the University of Chicago, the University of California system, and the University of Virginia joined together to create HathiTrust (www.hathitrust.org) specifically for these purposes. With the number of volumes digitized by Google soaring into the millions and with the expansion of Internet Archive and other digitization efforts, these libraries sought to ensure the long-term accessibility of this content.

The participating institutions, now including Columbia University, are committed to the persistence of the cultural record, and have centuries of experience in the public trust. The current members, whose collections make up 75 percent of all of the content in Google Book Search, underwrite the costs of this shared effort to preserve the published scholarly record in digital form, regardless of digitization vendor or source, on behalf of scholarship and the public good.

Providing for the long-term preservation and access of this content is no small task, yet the HathiTrust partners have had significant success in creating a secure environment in a short amount of time. HathiTrust’s mission is “to contribute to the public good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge,” and it is already doing so for nearly five million volumes. This number will increase by millions as libraries continue their digitization projects with Google and others.

HathiTrust is a public good and provides as much access to its content as legally possible. Nearly one million volumes of the current holdings are in the public domain and accessible to anyone with a Web browser. This number will grow as the overall number of volumes continues to grow. HathiTrust provides a suite of services to scholars and others, including full-text and bibliographic search across the entire repository. All of HathiTrust’s services are separate from Google’s. HathiTrust partners have undertaken ongoing copyright review of orphan works to open access to volumes that are, in fact, in the public domain. Thousands of these volumes have already been opened, as well as a growing number of volumes that rights holders give HathiTrust permission to make available online.

Libraries are much further ahead in the game than Darnton would have readers believe. Although there are disappointments for Google partner libraries in the settlement agreement, libraries have worked to secure important privileges, including significant influence over the commercial pricing of Google’s Book Search product. The settlement also sanctions important uses of digital volumes, including those that are in copyright. These include providing access to content to users with print disabilities and using libraries’ digitized volumes in large-scale computational research. Opening the enormous body of Google-scanned content to new user populations and methods of inquiry will have a transformative effect on our ability to produce and analyze knowledge about our society, our heritage, and the world. We invite Harvard to join us in this endeavor!

Paul Courant, University Librarian and Dean of Libraries, University of Michigan
Laine Farley, Executive Director, California Digital Library
Paula Kaufman, University Librarian and Dean of Libraries, University of Illinois at Champaign-Urbana
John Leslie King, Vice Provost for Academic Information, University of Michigan
Brian Schottlaender, University Librarian, University of California, San Diego Libraries
Carolyn Walters, Interim Dean of Libraries, Indiana University
Brad Wheeler, Chief Information Officer, Indiana University
John Wilkin, Executive Director of HathiTrust and Associate University Librarian, Library Information Technology, University of Michigan

To the Editors:

In his article on the Google book digitization project Robert Darnton spoke in passing of faults “that mar Google’s enterprise.” He mentioned, among other things, “missing pages,” “omitted artwork,” and “censoring.” No doubt because I have not followed the controversy closely, these charges were news to me—and troubling, not least the idea of Google censoring books. It would be useful if Darnton could give any specific examples of such faults.

Anthony Lewis
Cambridge, Massachusetts

To the Editors:

Robert Darnton is right to suggest that the US government, or some other quasi-public consortium, should buy out Google’s vast digital library of orphaned and out-of-print books. If Google insists on retaining these holdings, it should be assigned the status of a public utility—like a gas, water, or electric company—and the appropriate regulatory authority should be set up. As with any monopoly of an essential public service, prices should be controlled, privacy should be protected, and open access should be guaranteed. Google is entitled to a reasonable return on its investment, but if future business decisions jeopardize this essential human resource, another provider should be found, and Google should be forced to divest.

Theodore Koditschek
Department of History
University of Missouri Columbia, Missouri

Robert Darnton replies:

As my friends from Michigan know, I think the HathiTrust is a good thing and have applauded it from the beginning. It provides a remedy to one of the weaknesses in Google Book Search (GBS) by assuming responsibility for the preservation of the digital material. Although we had developed a preservation program at Harvard long before the creation of Hathi, we are eager to cooperate with it.

The subject of my article was not Hathi but rather the revised settlement between Google and the authors and publishers who have sued it for alleged infringement of copyright. As the Department of Justice argued in its memorandum against the original settlement, GBS uses the device of a class-action suit to obtain an exclusive license to digitize and market unclaimed books—that is, copyrighted books that are out of print, including millions of so-called “orphans,” whose copyright remains unclaimed. According to the DOJ, “This de facto exclusivity (at least as to orphan works) appears to create a dangerous probability that only Google would have the ability to market to libraries and other institutions a comprehensive digital-book subscription.”

The Amended Settlement Agreement (ASA) does not correct this basic flaw. Neither does Hathi, whose functions are limited primarily to preservation. As Pamela Samuelson, a professor at the law school of the University of California, Berkeley, has pointed out, Google’s potential competitors will not have access to orphan books unless Congress passes legislation authorizing them to do so:

As a result, the main orphan book scam of the GBS settlement—that Google will get a de facto compulsory license from the settlement class that authorizes the firm to commercialize all out-of-print books covered by the settlement, including importantly the orphans—remains a serious problem with GBS 2.0 [i.e., the ASA].

I am puzzled by the letter writers’ claim that Hathi amounts to a “truly public library.” Public libraries make books and other materials accessible free of charge to readers. But the ASA prohibits any preservation facility like Hathi from letting readers read its books, except under very restricted circumstances. Similar prohibitions prevent libraries from allowing users to read the digitized copies made from their own holdings and provided to them by Google as a “Library Digital Copy” (LDC). Users may see a table of contents and short snippets, and in university libraries some faculty members may read as many as five pages of a book in the LDC, provided that it is out of print.

These terms—see the ASA, section 7.2 (b. iv, vii, and x)—prevent Hathi from performing the main function of a library. If they were amended, the way might be cleared for a national digital library. In its current form, however, the ASA subjects library holdings, which should be considered as public assets, to a private speculation, and by doing so it reinforces a powerful commercial monopoly.

In response to Anthony Lewis’s letter, while I have discussed problems of Google’s accuracy in these pages,* I should explain that I cannot provide examples of censorship by Google, because the ASA is still working its way through the courts and has not yet been applied. The revised settlement does not use the term “censorship,” but section 3.7(e) states: “Google may, at its discretion, exclude particular Books from one or more Display Uses for editorial or non-editorial reasons.” It goes on to assert that because Google and the plaintiffs “value the principle of freedom of expression,” Google will notify the Book Rights Registry of any book it excludes. Notification of exclusion hardly serves as a substitute for inclusion.

Google’s digital database will therefore lack every book that Google chooses to exclude. The early version of the settlement limited this authorization to 15 percent of the books in the corpus, but the ASA makes no mention of a limit. Because Google will own the digital works in the database, it can dispose of its property as it pleases—just as Rupert Murdoch did in 1998 when he was extending his media empire into China and refused to let his company, HarperCollins, publish a book by Christopher Patten that criticized Chinese policy in Hong Kong.

I like Theodore Koditschek’s suggestion that Google be treated as a public utility subject to regulation in the public interest. If that seems unrealistic, one should consider a compromise solution, which would draw a line between the books digitized by Google that are strictly commercial and the books that are no longer in print, although some of them are still covered by copyright. Google would continue with its project to commercialize digital copies of books currently in print, sharing the proceeds with the rights holders. At the same time, it would continue to scan out-of-print books and to include them in a database that would constitute a separate, open-access repository. The rights holders of the in-copyright but out-of-print books in that database would be given the opportunity to choose to keep their books out of the open-access plan and, if they preferred, to include their books in Google’s commercial operation.

This opt-out provision would be adapted from the similar provision of the ASA, which permits rights holders to remove their works from Google Book Search. By doing so, it would take advantage of the class-action character of the original lawsuit in order to promote a nonprofit project dedicated to the public good. The books in the open-access repository would be protected against litigation without recourse to legislation by Congress, and they would be merged with books in the public domain, forming a gigantic database—that is, a national digital library. (Of the ten million books that Google has digitized, roughly two million are in the public domain and six million are out of print but still protected by copyright.)

This collection would grow as copyrights expired and as more research libraries—perhaps Harvard, the New York Public Library, and the Library of Congress (which holds 32 million catalogued books)—agreed to participate, for they would have no qualms about devoting their collections to a genuine public library as opposed to a private commercial enterprise. Suitable measures would be taken to protect privacy and to win the support of foreign rights holders so that the American digital library would be interoperable with the European digital library funded by the European Commission, and other digital repositories, creating an international library on a truly global scale. Meanwhile, Google, the Authors Guild, and the Association of American Publishers would continue to draw income from the separate digital database composed of books that are currently in print and of out-of-print books whose rights holders elected to participate in Google Book Search. Google and the plaintiffs would suffer no loss of income, and they would gain goodwill from having contributed to the public welfare.

  1. *

    See “The Library in the New Age,” The New York Review, June 12, 2008. See also Geoff Nunberg, “Google Books: A Metadata Train Wreck,” Language Log, August 29, 2009.

