A World Digital Library Is Coming True! | Robert Darnton

Submit a letter:

Robert Dawson
The first Little Free Library, inviting visitors to ‘take a book, leave a book,’ Hudson, Wisconsin, 2012; photograph by Robert Dawson from his book *The Public Library: A Photographic Essay*, just published by Princeton Architectural Press

In the scramble to gain market share in cyberspace, something is getting lost: the public interest. Libraries and laboratories—crucial nodes of the World Wide Web—are buckling under economic pressure, and the information they diffuse is being diverted away from the public sphere, where it can do most good.

Not that information comes free or “wants to be free,” as Internet enthusiasts proclaimed twenty years ago.¹ It comes filtered through expensive technologies and financed by powerful corporations. No one can ignore the economic realities that underlie the new information age, but who would argue that we have reached the right balance between commercialization and democratization?

Consider the cost of scientific periodicals, most of which are published exclusively online. It has increased at four times the rate of inflation since 1986. The average price of a year’s subscription to a chemistry journal is now $4,044. In 1970 it was $33. A subscription to the Journal of Comparative Neurology cost $30,860 in 2012—the equivalent of six hundred monographs. Three giant publishers—Reed Elsevier, Wiley-Blackwell, and Springer—publish 42 percent of all academic articles, and they make giant profits from them. In 2013 Elsevier turned a 39 percent profit on an income of £2.1 billion from its science, technical, and medical journals.

All over the country research libraries are canceling subscriptions to academic journals, because they are caught between decreasing budgets and increasing costs. The logic of the bottom line is inescapable, but there is a higher logic that deserves consideration—namely, that the public should have access to knowledge produced with public funds.

Congress acted on that principle in 2008, when it required that articles based on grants from the National Institutes of Health be made available, free of charge, from an open-access repository, PubMed Central. But lobbyists blunted that requirement by getting the NIH to accept a twelve-month embargo, which would prevent public accessibility long enough for the publishers to profit from the immediate demand.

Not content with that victory, the lobbyists tried to abolish the NIH mandate in the so-called Research Works Act, a bill introduced in Congress in November 2011 and championed by Elsevier. The bill was withdrawn two months later following a wave of public protest, but the lobbyists are still at work, trying to block the Fair Access to Science and Technology Research Act (FASTR), which would give the public free access to all research, the data as well as the results, funded by federal agencies with research budgets of $100 million or more.

FASTR is a successor to the Federal Research Public Access Act (FRPAA), which remained bottled up in Congress after being introduced in three earlier sessions. But the basic provisions of both bills were adopted by a White House directive issued by the Office of Science and Technology Policy on February 22, 2013, and due to take effect at the end of this year. In principle, therefore, the results of research funded by taxpayers will be available to taxpayers, at least in the short term. What is the prospect over the long term? No one knows, but there are signs of hope.

The struggle over academic journals should not be dismissed as an “academic question,” because a great deal is at stake. Access to research drives large sectors of the economy—the freer and quicker the access, the more powerful its effect. The Human Genome Project cost $3.8 billion in federal funds to develop, and thanks to the free accessibility of the results, it has already produced $796 billion in commercial applications. Linux, the free, open-source software system, has brought in billions in revenue for many companies, including Google. Less spectacular but more widespread is the multiplier effect of free information on small and medium businesses that cannot afford to pay for information hoarded behind subscription walls. A delay of a year in access to research and data can be prohibitively expensive for them. According to a study completed in 2006 by John Houghton, a specialist in the economics of information, a 5 percent increase in the accessibility of research would have produced an increase in productivity worth $16 billion.

Yet accessibility may decrease, because the price of journals has escalated so disastrously that libraries—and also hospitals, small-scale laboratories, and data-driven enterprises—are canceling subscriptions. Publishers respond by charging still more to institutions with budgets strong enough to carry the additional weight. But the system is breaking down. In 2010, when the Nature Publishing Group told the University of California that it would increase the price of its sixty-seven journals by 400 percent, the libraries stood their ground, and the faculty, which had contributed 5,300 articles to those journals during the previous six years, began to organize a boycott.

The libraries and the publisher eventually reached a compromise, but the relentless increases continued to produce protests in the US and Europe. In France the University Pierre et Marie Curie recently canceled its subscription to Science when faced with a 100 percent increase, and the University of Paris V dropped subscriptions to three thousand journals. At Harvard, where e-journal subscriptions cost $9.9 million a year, the Faculty Advisory Council on the Library passed a resolution condemning the price increases as unsustainable.

In the long run, journals can be sustained only through a transformation of the economic basis of academic publishing. The current system developed as a component of the professionalization of academic disciplines in the nineteenth century. It served the public interest well through most of the twentieth century, but it has become dysfunctional in the age of the Internet. In fields like physics, most research circulates online in prepublication exchanges, and articles are composed with sophisticated programs that produce copy-ready texts. Costs are low enough for access to be free, as illustrated by the success of arXiv, a repository of articles in physics, mathematics, computer science, quantitative biology, quantitative finance, and statistics. (The articles do not undergo full-scale peer review unless, as often happens, they are later published by conventional journals.)

The entire system of communicating research could be made less expensive and more beneficial for the public by a process known as “flipping.” Instead of subsisting on subscriptions, a flipped journal covers its costs by charging processing fees before publication and making its articles freely available, as “open access,” afterward. That will sound strange to many academic authors. Why, they may ask, should we pay to get published? But they may not understand the dysfunctions of the present system, in which they furnish the research, writing, and refereeing free of charge to the subscription journals and then buy back the product of their work—not personally, of course, but through their libraries—at an exorbitant price. The public pays twice—first as taxpayers who subsidize the research, then as taxpayers or tuition payers who support public or private university libraries.

By creating open-access journals, a flipped system directly benefits the public. Anyone can consult the research free of charge online, and libraries are liberated from the spiraling costs of subscriptions. Of course, the publication expenses do not evaporate miraculously, but they are greatly reduced, especially for nonprofit journals, which do not need to satisfy shareholders. The processing fees, which can run to a thousand dollars or more, depending on the complexities of the text and the process of peer review, can be covered in various ways. They are often included in research grants to scientists, and they are increasingly financed by the author’s university or a group of universities.

At Harvard, a program called HOPE (Harvard Open-Access Publishing Equity) subsidizes processing fees. A consortium called COPE (Compact for Open-Access Publishing Equity) promotes similar policies among twenty-one institutions, including MIT, the University of Michigan, and the University of California at Berkeley; and its activities complement those of thirty-three similar funds in institutions such as Johns Hopkins University and the University of California at San Francisco.

The main impediment to public-spirited publishing of this kind is not financial. It involves prestige. Scientists prefer to publish in expensive journals like Nature, Science, and Cell, because the aura attached to them glows on CVs and promotes careers. But some prominent scientists have undercut the prestige effect by founding open-access journals and recruiting the best talent to write and referee for them. Harold Varmus, a Nobel laureate in physiology and medicine, has made a huge success of Public Library of Science, and Paul Crutzen, a Nobel laureate in chemistry, has done the same with Atmospheric Chemistry and Physics. They have proven the feasibility of high-quality, open-access journals. Not only do they cover costs through processing fees, but they produce a profit—or rather, a “surplus,” which they invest in further open-access projects.

The pressure for open access is also building up from digital repositories, which are being established in universities throughout the country. In February 2008, the Faculty of Arts and Sciences at Harvard voted unanimously to require its members (with a proviso for opting out or for accepting embargoes imposed by commercial journals) to deposit peer-reviewed articles in a repository, DASH (Digital Access to Scholarship at Harvard), where they can be read by anyone free of charge.

DASH now includes 17,000 articles, and it has registered three million downloads from countries in every continent. Repositories in other universities also report very high scores in their counts of downloads. They make knowledge available to a broad public, including researchers who have no connection to an academic institution; and at the same time, they make it possible for writers to reach far more readers than would be possible by means of subscription journals.

The desire to reach readers may be one of the most underestimated forces in the world of knowledge. Aside from journal articles, academics produce a large numbers of books, yet they rarely make much money from them. Authors in general derive little income from a book a year or two after its publication. Once its commercial life has ended, it dies a slow death, lying unread, except for rare occasions, on the shelves of libraries, inaccessible to the vast majority of readers. At that stage, authors generally have one dominant desire—for their work to circulate freely through the public; and their interest coincides with the goals of the open-access movement. A new organization, Authors Alliance, is about to launch a campaign to persuade authors to make their books available online at some point after publication through nonprofit distributors like the Digital Public Library of America, of which more later.

All sorts of complexities remain to be worked out before such a plan can succeed: How to accommodate the interests of publishers, who want to keep books on their backlists? Where to leave room for rights holders to opt out and for the revival of books that take on new economic life? Whether to devise some form of royalties, as in the extended collective licensing programs that have proven to be successful in the Scandinavian countries? It should be possible to enlist vested interests in a solution that will serve the public interest, not by appealing to altruism but rather by rethinking business plans in ways that will make the most of modern technology.

Several experimental enterprises illustrate possibilities of this kind. Knowledge Unlatched gathers commitments and collects funds from libraries that agree to purchase scholarly books at rates that will guarantee payment of a fixed amount to the publishers who are taking part in the program. The more libraries participating in the pool, the lower the price each will have to pay. While electronic editions of the books will be available everywhere free of charge through Knowledge Unlatched, the subscribing libraries will have the exclusive right to download and print out copies. By the end of February, more than 250 libraries had signed up to purchase a pilot collection of twenty-eight new books produced by thirteen publishers, and Knowledge Unlatched headquarters, located in London, announced that it would soon scale up its operations with the goal of combining open access with sustainability.

OpenEdition Books, located in Marseille, operates on a somewhat similar principle. It provides a platform for publishers who want to develop open-access online collections, and it sells the e-content to subscribers in formats that can be downloaded and printed. Operating from Cambridge, England, Open Book Publishers also charges for PDFs, which can be used with print-on-demand technology to produce physical books, and it applies the income to subsidies for free copies online. It recruits academic authors who are willing to provide manuscripts without payment in order to reach the largest possible audience and to further the cause of open access.

The famous quip of Samuel Johnson, “No man but a blockhead ever wrote, except for money,” no longer has the force of a self-evident truth in the age of the Internet. By tapping the goodwill of unpaid authors, Open Book Publishers has produced forty-one books in the humanities and social sciences, all rigorously peer-reviewed, since its foundation in 2008. “We envisage a world in which all research is freely available to all readers,” it proclaims on its website.

The same goal animates the Digital Public Library of America, which aims to make available all the intellectual riches accumulated in American libraries, archives, and museums. As reported in these pages, the DPLA was launched on April 18, 2013.² Now that it has celebrated its first anniversary, its collections include seven million books and other objects, three times the amount that it offered when it went online a year ago. They come from more than 1,300 institutions located in all fifty states, and they are being widely used: nearly a million distinct visitors have consulted the DPLA’s website (dp.la), and they come from nearly every country in the world (North Korea, Chad, and Western Sahara are the only exceptions).

At the time of its conception in October 2010, the DPLA was seen as an alternative to one of the most ambitious projects ever imagined for commercializing access to information: Google Book Search. Google set out to digitize millions of books in research libraries and then proposed to sell subscriptions to the resulting database. Having provided the books to Google free of charge, the libraries would then have to buy back access to them, in digital form, at a price to be determined by Google and that could escalate as disastrously as the prices of scholarly journals.

Google Book Search actually began as a search service, which made available only snippets or short passages of books. But because many of the books were covered by copyright, Google was sued by the rights holders; and after lengthy negotiations the plaintiffs and Google agreed on a settlement, which transformed the search service into a gigantic commercial library financed by subscriptions. But the settlement had to be approved by a court, and on March 22, 2011, the Southern Federal District Court of New York rejected it on the grounds that, among other things, it threatened to constitute a monopoly in restraint of trade. That decision put an end to Google’s project and cleared the way for the DPLA to offer digitized holdings—but nothing covered by copyright—to readers everywhere, free of charge.

Aside from its not-for-profit character, the DPLA differs from Google Book Search in a crucial respect: it is not a vertical organization erected on a database of its own. It is a distributed, horizontal system, which links digital collections already in the possession of the participating institutions, and it does so by means of a technological infrastructure that makes them instantly available to the user with one click on an electronic device. It is fundamentally horizontal, both in organization and in spirit.

Instead of working from the top down, the DPLA relies on “service hubs,” or small administrative centers, to promote local collections and aggregate them at the state level. “Content hubs” located in institutions with collections of at least 250,000 items—for example, the New York Public Library, the Smithsonian Institution, and the collective digital repository known as HathiTrust—provide the bulk of the DPLA’s holdings. There are now two dozen service and content hubs, and soon, if financing can be found, they will exist in every state of the union.

Such horizontality reinforces the democratizing impulse behind the DPLA. Although it is a small, nonprofit corporation with headquarters and a minimal staff in Boston, the DPLA functions as a network that covers the entire country. It relies heavily on volunteers. More than a thousand computer scientists collaborated free of charge in the design of its infrastructure, which aggregates metadata (catalog-type descriptions of documents) in a way that allows easy searching.

Therefore, for example, a ninth-grader in Dallas who is preparing a report on an episode of the American Revolution can download a manuscript from New York, a pamphlet from Chicago, and a map from San Francisco in order to study them side by side. Unfortunately, he or she will not be able to consult any recent books, because copyright laws keep virtually everything published after 1923 out of the public domain. But the courts, which are considering a flurry of cases about the “fair use” of copyright, may sustain a broad-enough interpretation for the DPLA to make a great deal of post-1923 material available for educational purposes.

A small army of volunteer “Community Reps,” mainly librarians with technical skills, is fanning out across the country to promote various outreach programs sponsored by the DPLA. They reinforce the work of the service hubs, which concentrate on public libraries as centers of collection-building. A grant from the Bill and Melinda Gates Foundation is financing a Public Library Partnerships Project to train local librarians in the latest digital technologies. Equipped with new skills, the librarians will invite people to bring in material of their own—family letters, high school yearbooks, postcard collections stored in trunks and attics—to be digitized, curated, preserved, and made accessible online by the DPLA. While developing local community consciousness about culture and history, this project will also help integrate local collections in the national network.

Spin-off projects and local initiatives are also favored by what the DPLA calls its “plumbing”—that is, the technological infrastructure, which has been designed in a way to promote user- generated apps or digital tools connected to the system by means of an API (application programming interface), which has already registered seven million hits. Among the results is a tool for digital browsing: the user types in the title of a book, and images of spines of books, all related to the same subject, all in the public domain, appear on the screen as if they were aligned together on a shelf. The user can click on a spine to search one work after another, following leads that extend far beyond the shelf space of a physical library. Another tool makes it possible for a reader to go from a Wikipedia article to all the works in the DPLA that bear on the same subject. These and many other apps have been developed by individuals on their own, without following directives from DPLA headquarters.

The spin-offs offer endless educational opportunities. For example, the Emily Dickinson Archive recently developed at Harvard will make available digitized copies of the manuscripts of all Dickinson’s poems. The manuscripts are essential for interpreting the work, because they contain many peculiarities—punctuation, spacing, capitalization—that inflect the meaning of the poems, of which only a few, badly mangled, were published during Dickinson’s lifetime. Nearly every high school student comes across a poem by Dickinson at one time or other. Now teachers can assign a particular poem in its manuscript and printed versions (they often differ considerably) and stimulate their students to develop closer, deeper readings. The DPLA also plans to adapt its holdings to the special needs of community colleges, many of which do not have adequate libraries.

In these and other ways, the DPLA will go beyond its basic mission of making the cultural heritage of America available to all Americans. It will provide opportunities for them to interact with the material and to develop materials of their own. It will empower librarians and reinforce public libraries everywhere, not only in the United States. Its technological infrastructure has been designed to be interoperable with that of Europeana, a similar enterprise that is aggregating the holdings of libraries in the twenty-eight member states of the European Union. The DPLA’s collections include works in more than four hundred languages, and nearly 30 percent of its users come from outside the US. Ten years from now, the DPLA’s first year of activity may look like the beginning of an international library system.

It would be naive, however, to imagine a future free from the vested interests that have blocked the flow of information in the past. The lobbies at work in Washington also operate in Brussels, and a newly elected European Parliament will soon have to deal with the same issues that remain to be resolved in the US Congress. Commercialization and democratization operate on a global scale, and a great deal of access must be opened before the World Wide Web can accommodate a worldwide library.

The Dream of a Universal Library

Digitization promised to democratize learning, and despite countervailing forces the trend is toward more open access. But is an ‘Alexandria in the cloud’ really an open sesame?

December 21, 2023 issue

Best Sellers by the Bargeload

December 19, 2019 issue

The Greatest Show on Earth

June 28, 2018 issue

Letters:

Robert A. Schneider

Overpriced Scholarship: An Exchange

November 6, 2014

The Dream of a Universal Library

Digitization promised to democratize learning, and despite countervailing forces the trend is toward more open access. But is an ‘Alexandria in the cloud’ really an open sesame?

December 21, 2023 issue

Best Sellers by the Bargeload

December 19, 2019 issue

The Greatest Show on Earth

June 28, 2018 issue

Robert Darnton

Robert Darnton’s latest book is The Revolutionary Temper: Paris, 1748–1789. He is the Carl H. Pforzheimer University Professor and University Librarian Emeritus at Harvard. (December 2023)

This Issue

May 22, 2014

Jerome Groopman

How Memory Speaks

Tim Judah

The Phony War?

John Cassidy

Elizabeth Warren’s Moment

All Contents