Roving thoughts and provocations

  • Email
  • Print
  • Comments

Six Reasons Google Books Failed

John Pope-Hennessey; drawing by David Levine

Judge Denny Chin’s opinion in rejecting the settlement between Google and the authors and publishers who sued it for infringement of their copyrights can be read as both as a map of wrong turns taken in the past and as an invitation to design a better route into the digital future. Extrapolating from the dense, 48-page text that accompanied the judge’s March 23 decision, it is possible to locate six crucial points where things went awry:

First, Google abandoned its original plan to digitize books in order to provide online searching. According to that plan, you would have been able to use Google to search the contents of books for a particular word or brief passage, but would not have been able to view or download a lengthy excerpt or an entire book. Thus, Google could have justified its display of snippets of text in the search results by invoking the doctrine of fair use. In this way, it might have won its case against the plaintiffs, the Authors Guild and the Association of American Publishers, and at the same time it could have helped revive fair use as a legitimate means of spreading knowledge—for example, in making digitized material available for teaching purposes.

Instead, Google chose to make its opponents its partners in a gigantic new library and book business, Google Book Search. The business plan led to a second misstep, because it included a dubious opt-out clause. Authors of out-of-print books who failed to notify Google of their refusal to participate in its project were deemed to have accepted it. (If enough of those authors could be located or volunteered to consent to the settlement, Google Book Search might build up a large database of books published since 1923. But the logistics and the transaction costs might make the task unfeasible, and the problem of orphan books would remain unsolvable without congressional legislation.)

Third, in setting terms for the digitization of orphan books—copyrighted works whose rights holders are not known—the settlement eliminated the possibility of competition. It gave Google exclusive protection against legal action by any rights holders who might be identified—no small matter, as there are probably several million orphan books (recent estimates go as high as five million), and the damages for copyright infringement could begin at $100,000 per title. This provision made Google and its partners effective proprietors of works they had not created. According to the original version of the settlement, they were to receive revenue from the sale of the orphan books according to a standard formula for dividing the pie: 37 percent to Google, 63 percent to the plaintiffs. That provision was corrected in a revised version of the settlement, but the Amended Settlement Agreement (ASA) continued to give Google legal protection that would be denied to any of its potential competitors. It amounted to changing copyright law by litigation instead of legislation.

Fourth, rights held by authors and publishers located outside the United States raised similar problems. Foreign rights holders objected that the digitization of their books would violate international copyright law, particularly in the case of out-of-print books, which Google proposed to market unless it received opt-out notification from the authors or their estates. The ASA met most of those objections by eliminating copyrighted books that were published abroad, except in the United Kingdom, Canada, and Australia. But foreigners continued to protest about the potential violation of their rights and noted that they, too, had an orphan book problem.

Fifth, the settlement was an attempt to resolve a class action suit, but the plaintiffs did not adequately represent the class to which they belonged. The Authors Guild has 8,000 members but the number of living writers who have published works during the last half century probably amounts to far more than one hundred thousand. As Judge Chin observed, 6,800 living writers—nearly as many as are members of the Guild—chose to opt out of Google Book Search, and many objected in memoranda to the court that they did not consider themselves represented by the Guild. Some, especially academics who do not live from their pens, said they cared more about the diffusion of their writing than about the small amounts that they might gain by sales.

Sixth, in the course of administering its sales, both of individual books and of access to its data base by means of institutional subscriptions, Google might abuse readers’ privacy by accumulating information about their behavior. Google could know who its readers were, precisely what they read, and when they did the reading. The ASA provided some assurances about this danger, but Judge Chin recommended more, should the ASA be revised and resubmitted to the court.

The cumulative effect of these objections, elaborated in 500 memoranda filed with the court and endorsed in large part by Judge Chin’s decision, could give the impression that the settlement, even in its amended version, is so flawed that it deserves to be pronounced dead and buried. Yet it has many positive features. Above all, it could provide millions of people with access to millions of books. If the price were moderate, the benefit would be extraordinary, and the result would give new life to old books, which rarely get consulted from their present locations on the remote shelves or distant storage facilities of research libraries. Google also committed itself to furnish its service free of charge on at least one terminal in all public libraries, to adapt the digitized texts to the needs of the visually impaired, and to make its data available for large-scale, quantitative research of the “non-consumptive” kind.

How can these advantages be preserved without the accompanying drawbacks? By creating a Digital Public Library of America (DPLA)—that is, a collection of works in all formats that would make our cultural heritage available online and free of charge to everyone everywhere.

Having argued so often for this alternative to Google Book Search, I may fall victim to the syndrome known in France as preaching for one’s own saint. Instead of repeating the arguments, I would like to show how the case for the DPLA would look if seen from the perspective of similar attempts in other countries.

The most impressive attempts to create national digital libraries are taking shape in Norway and The Netherlands. They have state support, and they involve plans to digitize books covered by copyright, even those that are currently in print, by means of collective agreements—not legalistic devices like the class action suit employed by Google and its partners, but voluntary arrangements, which reconcile the interests of the authors and publishers who own the rights with those of readers who want access to everything in their national languages. Of course, the number of books in Norwegian and Dutch is small compared with those in English. To form an idea of what could be done in the United States, it is better to study another venture, the pan-European digital library known as Europeana.

Europeana is still in a formative phase, but its basic structure is well developed. Instead of accumulating collections of its own, it will function as an aggregator of aggregators—that is, it will standardize data that flows in from providers in centralized locations, which themselves will have integrated data derived from many individual sources. Information will therefore be accumulated and coordinated at three levels: particular libraries will digitize their collections; national or regional centers will integrate them into central data bases; and Europeana will transform those data bases, from 27 constituent countries, into a single, seamless network. To the users, all these currents of information will remain invisible. They will simply search for an item—a book, an image, a recording, or a video—and the system will direct them to a digitized version of it, wherever it may be, making it available for downloading on a personal computer or a hand-held device.

To deliver such service, the system will require not only an effective technological architecture but also a way of coordinating the information required to locate the digitized items—“metadata,” as librarians call it. The staff of Europeana at The Hague has perfected a code to harmonize the metadata that will flow into it from every corner of the Continent. Unlike Google, it will not store digital files in a single data base or server farm. It will operate as a nerve center for what is known as a “distributed network,” leaving libraries, archives, and museums to digitize and preserve their own collections in the capillary system of the organic whole.

A digital library for America might well follow this model, although Europeana has not yet proven its viability. When a prototype went live on November 20, 2008, it was flooded with so many attempts at searches that the system crashed. But that failure can be taken as testimony to the demand for such a mega-library. Since then, Europeana has enlarged its capacity. It will resume functioning at full tilt in the near future; and by 2015 it expects to make thirty million items, a third of them books, available free of charge.

Who will pay for it? The European Union, drawing on contributions from its member states. (Europeana’s current budget is 4,923,000 euros, but most of the expenses fall on the institutions that create and preserve the digital files.) This financial model may not be suitable for the United States, but we Americans benefit from something that Europe lacks, a rich array of independent foundations dedicated to the public welfare. By combining forces, a few dozen foundations could provide enough money to get the DPLA up and running. It is impossible at this point to provide even ballpark estimates of the overall cost, but it should come to less than the 750 million euros that President Sarkozy pledged for the digitization of France’s “cultural patrimony.”

Once its basic structure has been erected, the DPLA could be enlarged incrementally. And after it has proven its capacity to provide services—for education at all levels, for the information needs of businesses, for research in every conceivable field—it might attract public funds. Long-term sustainability would remain a problem to be solved.

Other problems must be confronted in the near future. As the Google case demonstrated, nearly everything published since 1923, when copyright restrictions begin to apply, is out of bounds for digitization and distribution. The DPLA must respect copyright. In order to succeed where Google failed, it will have to include several million orphan books; and it will not be able to do that unless Congress clears the way by appropriate legislation. Congress nearly passed orphan-book bills in 2006 and 2008. It failed in part because of the uncertainty surrounding Google Book Search. A not-for-profit digital library truly devoted to the public welfare could be of such benefit to their constituents that members of Congress might pass a new bill carefully designed to protect the DPLA from litigation should holders of rights to orphan books be located and bring suit for damages.

Even better, Congress could create a mechanism to compensate authors for the downloading of books that are out of print but covered by copyright. Voluntary collective agreements among authors of in-print books, similar to those in Norway and The Netherlands, could make much contemporary literature accessible through the DPLA. The copyright problems connected with works produced outside the United States might be resolved by agreements between the DPLA and Europeana as well as by similar alliances with aggregators on other continents. “Born digital” items in diverse formats (among them the growing number of ebooks that do not also appear in printed form) pose still more problems. But the non-commercial character of the DPLA and its commitment to the public good would make all such difficulties look less formidable than they seemed to be when they were confronted by a company intent on maximizing profit at the expense of the public.

In short, the collapse of the settlement has a great deal to teach us. It should help us emulate the positive aspects of Google Book Search and avoid the pitfalls that made Google’s enterprise flawed from the beginning. The best way to do so and to provide the American people with what they need in order to thrive in the new information age is to create a Digital Public Library of America.

An extended version of this post will appear in the April 28 issue of the New York Review.

  • Email
  • Print
  • Comments

Please note that all comments are read by a moderator prior to approval. Comments posted using real names, rather than pseudonyms, will have a better chance of being approved. Abusive, repetitive, or incoherent comments will be deleted.