In a famous letter of 1813, Thomas Jefferson compared the spread of ideas to the way people light one candle from another: “He who receives an idea from me, receives instruction himself without lessening mine; as he who lites his taper at mine, receives light without darkening me.”
The eighteenth-century ideal of spreading light may seem archaic today, but it can acquire a twenty-first-century luster if one associates it with the Internet, which transmits messages at virtually no cost. And if Internet enthusiasm sounds suspiciously idealistic, one can extend the chain of associations to a key concept of modern economics—that of a public good. Public goods such as clean air, efficient roads, hygienic sewage disposal, and adequate schooling benefit the entire citizenry, and one citizen’s benefit does not diminish that of another. Public goods are not assets in a zero-sum game, but they do carry costs—up-front costs, usually paid through taxation, at the production end of the services and facilities that the public enjoys as users. The Jeffersonian ideal of access to knowledge as a public good does not mean that knowledge has no cost. We enjoy freedom of information, but information is not free. Someone had to pay for Jefferson’s taper.
I stress that point, because I want to offer a work-in-progress report on the Digital Public Library of America (DPLA) and to argue that it is a feasible, affordable project as well as an opportunity to realize the Enlightenment ideals on which our country was founded.
Although fantasies about a mega-meta-macro library go back to the ancients, the possibility of actually constructing one is recent. It dates from the creation of the Internet (1974) and the Web (1991). Google demonstrated that the new technology could be harnessed to create a new kind of library, one that, in principle, could contain all the books in existence. But Google Book Search is a story of a good idea gone bad. As first conceived, it promised to do what Google does best: search for pertinent information. Google would digitize millions of books provided for free from research libraries, and users would be able to locate material in them by entering key words and examining short snippets called up from the database.
Google would not make available the full texts of the books, and it might even indicate where they could be found in the nearest library. But because most of the books were covered by copyright, the Authors Guild and the Association of American Publishers (AAP) brought suit for alleged infringement of their intellectual property. Google could have defended itself by invoking the doctrine of “fair use”—tricky business, to be sure, because it hangs on arguments based on sections 107 and 108 of the 1976 copyright act whose obscurities and ambiguities have occupied lawyers for decades. But Google could have hired the best lawyers in the country to make a convincing case; and if it won, it would have scored a double victory for the public good: it would have promoted the accessibility of literature, and it would have established a broad and firm legal basis for the fair use of that literature.
Instead, Google chose the path of commercialization. After three years of secret negotiations with the plaintiffs, it reached a settlement that transformed the original search operation into a commercial speculation based on the value of the database of books. Access to the texts of the books would be sold back to libraries, including the libraries that had originally provided them free of charge, for an annual subscription fee, which would be set by representatives of the authors and publishers along with Google. Free of pressure from competition and from oversight by any public body, the cost of the subscription could escalate as disastrously as the price of academic journals has risen in the last two decades. The settlement therefore came down to an agreement about how to divide a pie: 37 percent of the profits would go to Google, 63 percent to the Authors Guild and the AAP.
The settlement had to be accepted by a federal court, because it involved a class action suit, and a judge had to verify that the Authors Guild and the AAP represented authors and publishers in general. The guild has only 8,000 members, but several hundred thousand Americans have published at least one book, and 6,800 authors, acting independently, had taken advantage of an opt-out clause in the settlement by notifying Google that they did not want to participate in its enterprise. Conflicting interests made it difficult to believe that the plaintiffs spoke for any coherent class. Judge Denny Chin of the Southern Federal District Court of New York therefore rejected the settlement in a decision announced on March 22, 2011. He also emphasized other, equally strong objections to it, including the fact that it threatened to constitute a monopoly and that it would give Google exclusive control over orphan works—that is, books whose copyright owners have not been identified.
So far, Google and the plaintiffs have failed to rework the settlement in a way that would make it acceptable to the court. At a hearing on September 15, Judge Chin set a trial schedule for the resumption of the original suit by the Authors Guild and the AAP, which would extend proceedings until next July. The publishers have indicated that they might reach a separate settlement with Google, but the Authors Guild appears to be less ready to compromise. And on August 17, a parallel class action suit over copyright, brought by a group of freelance writers, was rejected by another court in New York on the same grounds—namely, that the plaintiffs did not constitute a “class” with consistent interests. The legal obstacles therefore seem formidable. It may be too early to declare Google Book Search dead, but I do not see how it can be revived.
Whatever the fate of Google’s attempt to commercialize access to digitized books, the time has come to relight Jefferson’s taper. We now have it in our power to create a digital library that will make our cultural heritage available, free of charge, to all Americans—and to the entire world.
On October 1, 2010, a group of librarians, foundation heads, and computer scientists met at Harvard to discuss the possibility of constructing a Digital Public Library of America. The basic idea was simple: form a coalition of foundations to provide the funding; form a coalition of libraries to supply the books. But the task is enormously complex. After taking its measure, the group formed a steering committee to provide general guidance and to recruit support from diverse constituencies scattered around the country. A secretariat was appointed and set to work with the help of a grant from the Sloan Foundation to organize study of the most difficult questions. Six working groups produced reports, which cleared the way for a master plan.
A preliminary version of the plan was presented to the public on October 21, 2011, at a meeting in Washington hosted by the National Archives with the support of the Library of Congress, the National Endowment for the Humanities, and the Institute of Museum and Library Services. By now, therefore, it is possible to have a clear view, or at least a preview, of the DPLA’s most important features. Here are some thoughts—my own, not those of the steering committee—about five of them.
Scope and Content
The DPLA will not draw on one gigantic database. It will be a distributed system, which will aggregate collections from many research libraries, museums, and other institutions. It will provide one-click access to documents in many formats, including images, recordings, and videos. At first, however, it will consist primarily of books in the public domain. Google digitized about two million of them, and copies of its digital files have been deposited at HathiTrust, a digital repository set up in Michigan to preserve the output of Google’s digitizing and of other digital projects in sixty partner libraries. Although Hathi’s mission is preservation, an agreement might be reached for it to make its holdings available to the DPLA. The Internet Archive, a not-for-profit, open-access digitizing operation founded by the computer engineer and digital librarian Brewster Kahle, also can make available millions of files.
Research libraries everywhere have digitized great swaths of their special collections independently of Google. For example, Harvard has digitized and made freely accessible 2.3 million pages of public-domain material for its Open Collections Program, and it is cooperating with China in a program to digitize 51,500 rare Chinese works from its Yenching Library. Government sources are particularly rich. All fifty states have digitized most of their newspaper archives, and their holdings have been aggregated by the Library of Congress, which has offered to make this great trove of information available to the DPLA. By combining these and other sources, the DPLA can lay a foundation of incomparable depth and breadth.
Unfortunately, copyright laws prevent the public domain from extending beyond 1923. Most twentieth-century literature will therefore remain out of bounds for the DPLA, unless some legal way can be found to include it. And even assuming that copyright could be adjusted, where should the boundary be drawn? Participants at the Washington meeting stressed that nothing as yet has been excluded in discussions about the scope of the DPLA’s holdings. Some have argued that they should stretch right up to the present, provided that an agreement can be reached to compensate rights holders. Were that possible, the DPLA would become a truly “public” library for the entire country. But it also might alienate the public libraries that already exist, because of the danger that local authorities could cut the funding for their libraries on the erroneous pretext that the DPLA will provide their basic material.
For my part, I think the DPLA’s mission should be defined in a manner that would make its services clearly distinct from those of existing public libraries. It should leave them to supply their users with current material—whether best-selling novels or magazines or DVDs—and supplement that function by providing free access to the general corpus of books that constitute the world’s literary heritage. Where then would its collections stop? Most books go out of print with astonishing rapidity. In fact, if they make it into bookstores (most don’t), their shelf life is often a matter of days; and few of them continue to sell, even as e-books, after a year. I suggest that the DPLA exclude everything published within the last five or ten years, and that a moving wall, which would advance a year at a time, keep it from interfering in the current market.
When the DPLA opens, as we expect, in 2013, it will probably contain only a basic stock of public-domain works and special collections furnished at a minimal cost by research libraries. From that point onward, it will grow as fast as the funding permits, but its initial expenses will be devoted for the most part to the creation of its technical architecture and administration. It will be designed in a way that will make it compatible with major digital libraries in other countries. In fact, it has already reached an agreement to cooperate with Europeana, the pan-European digital library that aggregates collections from twenty-seven countries. Europeana now runs on a budget of €5 million a year, but it does not become directly involved in digitization, collection development, or preservation. It leaves those functions to the national collections. The example of Europeana therefore suggests the bare minimum of what it will cost to get the DPLA up and running.