What would it cost if the DPLA led a major effort to digitize books that are covered by copyright but out of print, assuming there were no legal impediments? Brewster Kahle, who has digitized more than a million works for his Internet Archive, says he can digitize a book for ten cents a page or $30 for an ordinary work of about three hundred pages, and he estimates that he could digitize the entire contents of a great library—one with ten million volumes, somewhat larger than that of Princeton and smaller than Yale’s—for $300 million.
Other experts find those costs too low. They consider a dollar a page a conservative estimate; and they note that, aside from the scanning, a great deal of work must be done to perfect the metadata and to assure preservation, not to mention other possible services such as curation and the development of apps. But the costs of digitization and preservation are decreasing, and the technology is improving. The DPLA will begin with a base of several million volumes, and it will grow incrementally by digitizing at a rate that conforms to its budget. What will that budget be? No one knows until a business plan is perfected sometime before April 2013.
By combining ballpark and back-of-the-envelope estimates, one could imagine digitizing a million books a year on an annual budget of $75–100 million. (The budget of the Library of Congress in fiscal 2010 came to $684.3 million.) If a grand coalition of foundations contributed $100 million a year, a great library would exist within a decade. Double that rate, and the library soon would be the greatest that ever existed. But we needn’t rush. We must do the job right, because the DPLA should last for centuries, and, if necessary, it could grow gradually on a budget of not more than $5–10 million a year. A coalition of foundations could provide that much money; and once the DPLA had proved its usefulness, it might tap other sources, perhaps private industry or even Congress.
The DPLA must respect copyright. How far it can go in making accessible books that are out of print but covered by copyright depends on the interpretation of copyright laws by the courts and the possibility of modifying them by congressional action. The history of copyright in the United States goes back to article 1, section 8, clause 8 of the Constitution, which sets two objectives: “to promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.” The first copyright law, passed in 1790, struck a balance between those objectives by giving authors an exclusive right to the sale of their work for fourteen years, renewable once.
At that time, Jefferson’s taper was burning bright, and American statesmen took heed of British precedent. Parliament, in the original copyright law of 1710, had adopted a limit of twenty-eight years for work already in print and fourteen years for all work published after the law’s enactment, renewable once for another fourteen years so long as the author was still alive. Claims for perpetual copyright had been debated in a series of British court cases until they were definitively rejected in the great decision of Donaldson v. Becket in 1774. During the debate over the Sonny Bono Copyright Term Extension Act of 1998, Jack Valenti, the head of the Hollywood lobby, was asked how long he thought copyrights should last if they could not be perpetual. “Forever minus one day,” he reportedly replied. Since then, the flame of Jefferson’s taper has nearly died out.
The current limit of copyright—the life of the author plus seventy years—tips the balance decisively in favor of private interests at the expense of the public good. The public domain extends only to 1923. Every book published after 1963 is now covered by copyright, whether or not its copyright has been renewed, according to congressional acts of 1976, 1992, and 1998. The status of many books published between 1923 and 1964 remains ambiguous, because at that time copyrights had to be renewed, and the record of renewals does not leave a clear trail leading to the copyright holders today, if any have survived. Hence the problem of orphans.
Further legislation could solve the problem. But lobbyists had such a heavy hand in attempts to pass orphan book legislation in 2006 and 2008 that some consider it impossible to redress the balance of copyright law in a way that would “promote the progress of science and useful arts.” The only recourse may be to sections 107 and 108 of the copyright law of 1976, which, as mentioned, open the way for the “fair use” of copyrighted materials. Unfortunately, that way passes over some very uncertain terrain. (A study group on section 108 composed of librarians and lawyers worked through the problems for two years and came up with some proposals but nothing that has had any effect.)
“Fair use” normally applies to noncommercial activities such as criticism, scholarship, and teaching. Google’s original, search-and-snippets enterprise involved accompanying advertisements intended to bring in revenue for a profit-minded business. By contrast, the DPLA will be a not-for-profit association dedicated to the public good, and therefore it might stand a better chance with a fair-use defense, in case it should be sued by owners of rights to books that it had digitized in the mistaken belief that they were orphans. But should the DPLA run such a risk?
Probably not. Congressional legislation dealing with orphan books might provide immunity from litigation and set up an escrow fund to compensate rights holders of books that had been treated as orphans. And if Congress will not act, the DPLA could try to reach an agreement with authors and publishers whose copyrighted books have gone out of print. Google had attempted to do so in the settlement, which included an opt-out default: all authors were deemed to have accepted the terms of the settlement unless they notified Google to the contrary. But how many writers or their descendants would be aware that they could claim rights? This aspect of the case especially troubled Judge Chin, because it seemed to give Google monopolistic control over nearly the entire body of orphan books—and there probably are more than a million of them. Could an opt-out provision pass muster if it were applied for the benefit of the public by a not-for-profit organization?
Again the answer is probably no. But a solution might be found in legal arrangements known as extended collective licenses (ECL), which have been successfully developed in the Scandinavian countries. In Norway, a broad-based association of authors allied with publishers has developed an ECL that represents the interests of all copyright owners in digitizing and making accessible, free of charge, all Norwegian books to readers located in Norway. The rights holders will be compensated from a fund according to a fixed fee per page of use by readers, who can consult the texts on their screens but not download them, and authors can opt out of the system. In some respects—the creation of a “class” that represents all authors and the opt-out default—the Norwegian program resembles Google Book Search, except that it was authorized by legislation and is subject to government oversight.
Of course, the United States has little experience with collective management of rights, although the Copyright Clearance Center and JSTOR—the Mellon Foundation’s program for digitizing scholarly journals—might provide models, and America’s culture is much less homogeneous than Norway’s. The Authors Guild may refuse to yield an inch in defending the interests of professional authors. But if there were to be a national Digital Public Library, most authors probably would prefer to have digitized versions of their out-of-print books made available for a small fee or even for free, rather than to leave them languishing unread on the shelves of a few libraries. Above all, authors want readers, and the minority of authors who live from their pens could opt out of this arrangement. Some of the best legal minds are now developing plans for an American ECL regime, which would make it possible for our national digital library to include everything published in the twentieth century.
Last June the steering committee of the DPLA opened an international “Beta Sprint” competition for the best pilot projects, tools, and tentative blueprints of the infrastructure that will hold the system together and make it operate seamlessly for users. More than sixty potential applicants expressed interest. Nearly forty submitted projects by the deadline of September 1. A panel of experts from around the country selected the six most promising projects, and the six were presented to the public at the general meeting in Washington on October 20–21. The technical subcommittee of the DPLA will oversee the effort to cull and combine the best ideas of the winners and to come up with a draft prototype by April 2012. The prototype will be perfected during the next six months, and it should be ready to go into operation when the DPLA is launched in April 2013.
The race to this deadline may seem breathless, but it is fueled by enthusiasm and energy. Leading figures in computer science, information technology, and library science have assured us that the task is doable, and we will get it done.
I have arrived at the last of my five topics, and here I must be brief, because the governance committee of the DPLA has only begun to study the possibilities for administering it after it is launched a year and a half from now. Where should it be located? Who should lead it? To whom should it be responsible? How will it formulate policy and administer its services?
The present secretariat, under the able leadership of John Palfrey, head of the Harvard Law School Library, will continue to direct affairs during the final eighteen months of the embryonic DPLA’s existence—extending from the launch on October 21 until the DPLA’s opening in April 2013. By April 2013, the newly born DPLA will have set up headquarters—probably at a considerable distance from Harvard. The Harvard phase of its existence had to do with its original conception by a group of self-appointed enthusiasts. The mature DPLA will belong to the entire country and will serve a broad constituency, including ordinary readers, independent researchers, the multifaceted public of public libraries, K–12 schoolchildren, students in community colleges, university students and faculty, and book lovers of every stripe.
In order to fulfill its broad mission, the DPLA will probably be responsible to a board of trustees representing a wide variety of interests. It will need a staff of professionals and, no doubt, a director with plenty of expertise and energy. Just how the trustees and staff will be chosen will depend on the kind of legal structure that emerges. The library might be absorbed by an NGO that has a strong record of excellence in library affairs, or it could operate as an independent corporation by taking advantage of section 501(c)(3) of the Internal Revenue Code, which favors nonprofit organizations. At present, most people think it should not be part of the federal government so that it will be free from political pressures. It might resemble the National Academy of Sciences or the BBC.
In fact, however, it won’t resemble anything, because nothing like it has ever existed. A library without walls that will extend everywhere and contain nearly everything available in the walled-in repositories of human culture… E pluribus unum! Jefferson would have loved it.