When I look back at the plight of American research libraries in 2010, I feel inclined to break into a jeremiad. In fact, I want to deliver three jeremiads, because research libraries are facing crises on three fronts; but instead of prophesying doom, I hope to arrive at a happy ending.
I can even begin happily, at least in describing the state of the university library at Harvard. True, the economic crisis hit us hard, so hard that we must do some fundamental reorganizing, but we can take measures to make a great library greater, and we can put our current difficulties into perspective by seeing them in the light of a long history. Having begun in 1638 with the 400 books in John Harvard’s library, we now have accumulated nearly 17 million volumes and 400 million manuscript and archival items scattered through 45,000 distinct collections. I could string out the statistics indefinitely. We collect in more than 350 languages and many different formats. We have 12.8 million digital files, more than 100,000 serials, nearly 10 million photographs, online records of 3.4 million zoological specimens, and endlessly rich special collections, including the largest library of Chinese works outside of China (with the exception of the Library of Congress) and more Ukrainian titles than exist in Ukraine.
We want to make it possible for other people to consult those collections by digitizing large portions of them and making them available, free of charge, to the rest of the world from an online repository. We group the material around themes such as women at work, immigration, epidemics and disease control, Islamic heritage, and scientific explorations—2.3 million pages in all. This Open Collections Program, as we call it, is part of a general policy of opening up our library to the outside world and sharing our intellectual wealth. The latest project is devoted to reading, its practices and history. It involved the digitization of more than 250,000 pages from manuscripts and rare books, including richly annotated works such as Melville’s copy of Emerson’s essays and Keats’s copy of Shakespeare.
There are few places aside from research libraries where rare books and e-books can be brought together. At Harvard we use combinations of them for teaching as well as research. I now teach a seminar on the history of books in our rare book library. It begins with Gutenberg. The students investigate the origins of printing by examining a Gutenberg Bible, the real thing, and they do not just stare at it from a respectful distance, but they are invited to leaf (carefully) through its pages in order to appreciate the varieties of rubrication and typographical design. The seminar ends in a high-tech lab on the bottom floor of Widener Library, where experts in digitization explain how to adjust nuances of color while scanning medieval manuscripts.
Despite financial pressure, we therefore are advancing on two fronts, the digital and the analog. People often talk about printed books as if they were extinct. I have been invited to so many conferences on “The Death of the Book” that I suspect it is very much alive.
In fact, more printed books are produced each year than the year before. Soon there will be a million new titles published worldwide each year. A research library cannot ignore this production on the grounds that our readers are now “digital natives” living in a new “information age.” If the history of books teaches anything, it is that one medium does not displace another, at least not in the short run. Manuscript publishing continued to thrive for three centuries after Gutenberg, because it was often cheaper to produce a small edition by hiring scribes than by printing it. The codex—a book with pages that you turn rather than a scroll that you read by unrolling—is one of the greatest inventions of all time. It has served well for two thousand years, and it is not about to become extinct. In fact, it may be that the new technology used in print-on-demand will breathe new life into the codex—and I say this with due respect to the Kindle, the iPad, and all the rest.
But without neglecting our collections of printed books, we must forge ahead on the other, the digital front. Our purchases of e-resources increased by 25 percent at Harvard last year. We are expanding our enormous Digital Repository Service in a campaign not just to save digital texts but to help solve the problem of preserving them. A new Library Lab is inventing techniques for digital browsing and the preservation of e-mail, websites, and born-digital archives. Our open-access repository, DASH, is making current articles by Harvard faculty available online and free of charge throughout the world. And we plan to collaborate with MIT in building joint digital collections. In short, we are looking far ahead into the twenty-first century, and we hope to help shape the information society of the future.
Still, there is no disguising the fact that research libraries are going through hard times—times so hard that they are inflicting serious damage on the entire world of learning. We face three especially difficult problems, which I would like to discuss by drawing on my own experience, recounted in the form of three jeremiads.
In 1998 I had my first encounter with a problem that now pervades the academic world. It can be described as a vicious circle: the escalation in the price of periodicals forces libraries to cut back on their purchase of monographs; the drop in the demand for monographs makes university presses reduce their publication of them; and the difficulty in getting them published creates barriers to careers among graduate students. Although librarians have lived with this problem for decades, faculty are only dimly aware of its existence—not surprisingly, because libraries pay for the journals, professors don’t.
When this problem first dawned on me as chairman of Princeton’s library committee in the 1980s, the price of journals had already increased far more than the inflation rate; and the disparity has continued until today. In 1974 the average cost of a subscription to a journal was $54.86. In 2009 it came to $2,031 for a US title and $4,753 for a non-US title, an increase greater than ten times that of inflation. Between 1986 and 2005, the prices for institutional subscriptions to journals rose 302 percent, while the consumer price index went up by 68 percent. Faced with this disparity, libraries have had to adjust the proportions of their acquisitions budgets. As a rule, they used to spend about half of their funds on serials and half on monographs. By 2000 many libraries were spending three quarters of their budget on serials. Some had nearly stopped buying monographs altogether or had eliminated them in certain fields.
Another rule of thumb used to prevail among the better university presses. They could count on research libraries purchasing about eight hundred copies of any new monograph. By 2000 that figure had fallen to three or four hundred, often less, and not enough in most cases to cover production costs. Therefore, the presses abandoned subjects like colonial Latin America and Africa. They fell back on books about local folklore or cooking or birds, works that fit into niches or could be marketed to a broader public but that had little to do with scholarly research. And graduate students fell victim to the notorious syndrome of publish or perish.
As president of the American Historical Association in 1999, I thought I could do something, at least in a small way, to reverse this trend. I persuaded the Andrew W. Mellon Foundation to finance a program, called Gutenberg-e, that would award prizes to the best Ph.D. theses in the most endangered fields. The prize money would subsidize the cost of converting the dissertations into books, books of a new kind, electronic books that would take advantage of the new technology to incorporate all sorts of new elements—film clips, recordings, images, and whole collections of documents. The originality and the quality of these e-books would legitimate a new form of scholarly communication and revive the monograph.
One of the first questions that the people at Mellon asked me was “What is your business plan?” Although I had never heard of a business plan, I soon began to appreciate the economic conditions of scholarship. Columbia University Press developed a program to sell the e-books to research libraries as a package for a moderate subscription price. The libraries responded favorably, but the scholars had difficulty in producing their books on time, the pipeline became clogged, and the delayed output hurt sales. In the end, after seven years of struggle, we produced a fine series of thirty-five books, and we had begun to cover our costs. But Columbia, like many university presses, came under severe financial pressure. It decided that it could not continue the series after the Mellon grant ran out. The books were assimilated into the Humanities E-Book program developed by the American Council of Learned Societies, and they are still available online. But Gutenberg-e did not open up an escape route from the problems of sustainability that were plaguing academic life.
A few years later, “sustainability” had become a buzz word, and the inflationary spiral of journal prices had continued unabated. In 2007 I became director of the Harvard University Library, a strategic position from which to take the full measure of the business constraints on academic life. Although economic conditions had worsened, the faculty’s understanding of them had not improved.
How many professors in chemistry can give you even a ballpark estimate of the cost of a year’s subscription to Tetrahedron (currently $39,082)? Who in medical schools has the foggiest notion of the price of The Journal of Comparative Neurology ($27,465)? What physicist can come up with a reasonable guess about the average price of a journal in physics ($3,368), and who in the humanities can compare that with the average price of a journal in language and literature ($275) or philosophy and religion ($300)?
Librarians who buy these subscriptions for the use of faculty and students can shower you with statistics. In 2009, Elsevier, the giant publisher of scholarly journals based in the Netherlands, made a $1.1 billion profit in its publishing division, yet 2009 was a disastrous year for library budgets. Harvard’s seventy-three libraries cut their expenditures by more than 10 percent, and other libraries suffered even greater reductions, but the journal publishers were not impressed. Many of them raised their prices by 5 percent and sometimes more. This year, the publishers of the several Nature journals announced that they were increasing the cost of subscriptions for libraries in the University of California by 400 percent. Profit margins of journal publishers in the fields of science, technology, and medicine recently ran to 30–40 percent; yet those publishers add very little value to the research process, and most of the research is ultimately funded by American taxpayers through the National Institutes of Health and other organizations.
University libraries have little defense against excessive pricing. If they cancel a subscription, the faculty protest about being cut off from the circulation of knowledge, and the publishers impose drastic cancellation fees. Those fees are written into contracts, which often cover “bundles” of journals, sometimes hundreds of them, over a period of several years. The contracts provide for annual increases in the cost of the bundle, even though a library’s budget may decrease; and the publishers usually insist on keeping the terms secret, so that one library cannot negotiate for cheaper rates by citing an advantage obtained by another library. A recent court case in the state of Washington makes it seem possible that publishers will no longer be able to prevent the circulation of information about their contracts. But they continue to sell subscriptions in bundles. If in negotiating the renewal of a contract a library attempts to unbundle the offer in order to eliminate the least desirable journals, the publishers commonly raise the prices of the other journals so much that the total cost remains the same.
While prices continued to spiral upward, professors became entrapped in another kind of vicious circle, unaware of the unintended consequences. Reduced to essentials, it goes like this: we academics devote ourselves to research; we write up the results as articles for journals; we referee the articles in the process of peer reviewing; we serve on the editorial boards of the journals; we also serve as editors (all of this unpaid, of course); and then we buy back our own work at ruinous prices in the form of journal subscriptions—not that we pay for it ourselves, of course; we expect our library to pay for it, and therefore we have no knowledge of our complicity in a disastrous system.
Professors expect services from their libraries, even if they never set foot in them and consult Tetrahedron or The Journal of Comparative Neurology from computers in their labs. A few, however, have stared the problem in the face and seized it by the horns. In 2001 scientists at Stanford and Berkeley circulated a petition calling for their colleagues to submit articles only to open-access journals—that is, journals that made them available from digital repositories free of charge, either immediately or after a delay.
The effectiveness of such journals had been proven by BioMed Central, a British enterprise, which had been publishing a whole series of them since 1999. Led by Harold Varmus, a Nobel laureate who is now director of the National Cancer Institute, American researchers allied with the Public Library of Science founded their own series, beginning with PLoS Biology in 2003. Foundations provided start-up funding, and ongoing publication costs were covered by the research grants received by the authors of the articles. Thanks to rigorous peer review and the prestige of the authors, the PLoS publications were a great success.
According to citation indexes and statistics of hits, open-access journals were consulted more frequently than most commercial publications. By 2008, when the National Institutes of Health required the recipients of its grants to make their work available through open access—although it permitted an embargo of up to twelve months—cracks were appearing everywhere in the commercial monopoly of publishing in the medical sciences.
But what could be done in all the other disciplines, especially those in the humanities and social sciences, where grants are not so generous, if they exist at all? Several universities passed resolutions in favor of open access and established digital repositories for articles, but the compliance rate of the professors, often 4 percent or less, made them look ineffective. At Harvard we developed a new model. By a unanimous vote on February 12, 2008, professors in the Faculty of Arts and Sciences bound themselves to deposit all of their future scholarly articles in an open-access repository to be established by the library and also granted the university permission to distribute them.
This arrangement had an escape clause: anyone could refuse to comply by obtaining a waiver, which would be granted automatically. In this way, professors retained the liberty to publish in closed-access journals, which might refuse to accept an article available elsewhere on open access or might require an embargo. This model has now spread to other faculties at Harvard and to other universities, but it is not a business model. If the monopolies of price-gouging publishers are to be broken, we need more than open-access repositories. We need open-access journals that will be self-sustaining.
A supplementary program at Harvard now subsidizes publishing fees for articles submitted to open-access journals, up to a yearly limit, for each professor. The idea is to reverse the economics of journal publishing by covering costs, rationally determined, at the production end instead of by paying for an exorbitant profit in addition to the production costs at the consumption end. If other universities adopt the same policy and if professors apply pressure on editorial boards, journals will shift, little by little, one after the other, to open access. The Compact for Open-Access Publishing Equity (COPE), launched this year, is an attempt to create a coalition of universities to push journal publishing in this direction. It also envisages subsidies for authors who cannot expect financial help from grants or their home universities.
If COPE succeeds, it could save billions of dollars in library budgets. But it will only succeed in the long run. Meanwhile, the prices of commercial journals continue to rise, and the balance sheets of university presses continue to sink into the red. In 2003 Walter Lippincott, the director of Princeton University Press, predicted that twenty-five of the eighty-two university presses in the United States would disappear within five years. They are still alive, but they are barely holding on by their fingernails. They may find a second life by publishing online and by taking advantage of technological innovations such as the Espresso Book Machine. This can download an electronic text from a database, print it out within four minutes, and make it available at a moderate price as an instant print-on-demand paperback.
But just when this glimmer of hope appeared on the horizon, it was overshadowed by the most powerful technological innovation of them all: relevance-ranking search engines linked to gigantic databases, as in the case of Google Book Search, which is already providing readers with access to millions of books. This brings me to Jeremiad 3.
Google represents the ultimate in business plans. By controlling access to information, it has made billions, which it is now investing in the control of the information itself. What began as Google Book Search is therefore becoming the largest library and book business in the world. Like all commercial enterprises, Google’s primary responsibility is to make money for its shareholders. Libraries exist to get books to readers—books and other forms of knowledge and entertainment, provided for free. The fundamental incompatibility of purpose between libraries and Google Book Search might be mitigated if Google could offer libraries access to its digitized database of books on reasonable terms. But the terms are embodied in a 368-page document known as the “settlement,” which is meant to resolve another conflict: the suit brought against Google by authors and publishers for alleged infringement of their copyrights.
Despite its enormous complexity, the settlement comes down to an agreement about how to divide a pie—the profits to be produced by Google Book Search: 37 percent will go to Google, 63 percent to the authors and publishers. And the libraries? They are not partners to the agreement, but many of them have provided, free of charge, the books that Google has digitized. They are being asked to buy back access to those books along with those of their sister libraries, in digitized form, for an “institutional subscription” price, which could escalate as disastrously as the price of journals. The subscription price will be set by a Book Rights Registry, which will represent the authors and publishers who have an interest in price increases. Libraries therefore fear what they call “cocaine pricing”—a strategy of beginning at a low rate and then, when customers are hooked, ratcheting up the price as high as it will go.
To become effective, the settlement must be approved by the district court in the Southern Federal District of New York. The Department of Justice has filed two memoranda with the court that raise the possibility, indeed the likelihood, that the settlement could give Google such an advantage over potential competitors as to violate antitrust laws. But the most important issue looming over the legal debate is one of public policy. Do we want to settle copyright questions by private litigation? And do we want to commercialize access to knowledge?
I hope that the answer to those questions will lead to my happy ending: a National Digital Library—or a Digital Public Library of America (DPLA), as some prefer to call it. Google demonstrated the possibility of transforming the intellectual riches of our libraries, books lying inert and underused on shelves, into an electronic database that could be tapped by anyone anywhere at any time. Why not adapt its formula for success to the public good—a digital library composed of virtually all the books in our greatest research libraries available free of charge to the entire citizenry, in fact, to everyone in the world?
To dismiss this goal as naive or utopian would be to ignore digital projects that have proven their worth and feasibility throughout the last twenty years. All major research libraries have digitized parts of their collections. Since 1995 the Digital Library Federation has worked to combine their catalogues or “metadata” into a general network. More ambitious enterprises such as the Internet Archive, Knowledge Commons, and Public.Resource .Org have attempted digitization on a larger scale. They may be dwarfed by Google, but several countries are now determined to out-Google Google by scanning the entire contents of their national libraries.
In December 2009 President Nicolas Sarkozy of France announced that he would make €750 million available for digitizing the French cultural “patrimony.” The National Library of the Netherlands aims to digitize within ten years every Dutch book, newspaper, and periodical produced from 1470 to the present. National libraries in Japan, Australia, Norway, and Finland are digitizing virtually all of their holdings; and Europeana, an effort to coordinate digital collections on an international scale, will have made over ten million objects—from libraries, archives, museums, and audiovisual holdings—freely accessible online by the end of 2010.
If these countries can create national digital libraries, why can’t the United States? Because of the cost, some would argue. Far more works exist in English than in Dutch or Japanese, and the Library of Congress alone contains 30 million volumes. Estimates of the cost of digitizing one page vary enormously, from ten cents (the figure cited by Brewster Kahle, who has digitized over a million books for the Internet Archive) to ten dollars, depending on the technology and the required quality. But it should be possible to digitize everything in the Library of Congress for less than Sarkozy’s €750 million—and the cost could be spread out over a decade.
The greatest obstacle is legal, not financial. Presumably, the DPLA would exclude books currently being marketed, but it would include millions of books that are out of print yet covered by copyright, especially those published between 1923 and 1964, a period when copyright coverage is most obscure, owing to the proliferation of “orphans”—books whose copyright holders have not been located. Congress would have to pass legislation to protect the DPLA from litigation concerning copyrighted, out-of-print books. The rights holders of those books would have to be compensated, yet many of them, especially among academic authors, might be willing to forgo compensation in order to give their books new life and greater diffusion in digitized form. Several authors protested against the commercial character of Google Book Search and expressed their readiness to make their work available free of charge in memoranda filed with the New York District Court.
Perhaps even Google itself could be enlisted in the cause. It has digitized about two million books in the public domain. It could turn them over to the DPLA as the foundation of a collection that would grow to include more recent books—at first those from the problematic period of 1923–1964, then those made available by their rights holders. Google would lose nothing by this generosity; each digitized book that it made available could, if other donors agree, be identified as a contribution from Google; and it might win admiration for its public-spiritedness.
Even if Google refused to cooperate, a coalition of foundations could provide enough to finance the DPLA, and a coalition of research libraries could provide the books. By working systematically through their holdings, a great collection could be formed. It would conform to the highest standards in its bibliographical apparatus, its scanning, its editorial decisions, and its commitment to preservation for the use of future generations.
Should the Google Book Search agreement not be upheld by the court, its unraveling would come at an extraordinary moment in the development of an information society. We have now reached a period of fluidity, uncertainty, and opportunity. Things have come undone, and they can be put together in new ways, subordinating private profit to the public good and providing everyone with access to a commonwealth of culture.
Would a Digital Public Library of America solve all the other problems—the inflation of journal prices, the economics of scholarly publishing, the unbalanced budgets of libraries, and the barriers to the careers of young scholars? No. Instead, it would open the way to a general transformation of the landscape in what we now call the information society. Rather than better business plans (not that they don’t matter), we need a new ecology, one based on the public good instead of private gain. This may not be a satisfactory conclusion. It’s not an answer to the problem of sustainability. It’s an appeal to change the system.
—November 23, 2010
How to Use the Internet February 24, 2011