Microsoft contracted to digitize library in 2008

A year from now, 100,000 books from Yale’s libraries will be available online. Just don’t try to search for them on Google.

The University announced last week that it has joined forces with Microsoft to digitize thousands of books from Yale’s library system. The partnership comes just a month after a consortium of 19 academic and research libraries around New England rebuffed Microsoft and Google — citing concerns about the accessibility of the books they scan — and announced they would pay for the digitization of their collections out of their own pockets.

Books requested by patrons are shelved behind the circulation desk of Sterling Memorial Library. Many could be available online in the next year.
Jose Meza
Books requested by patrons are shelved behind the circulation desk of Sterling Memorial Library. Many could be available online in the next year.

Yale’s move sparked controversy in the book world, with some critics saying the University forwent principle just to save a few dollars. But the Yale librarians said it would not otherwise be able to afford to digitize so many books. Yale has the second-largest university library in the world, with 13 million volumes.

In one front of a larger war between Internet behemoths, Microsoft and Google have aggressively courted university libraries in recent years, promising to subsidize or pay entirely for the cost of digitizing thousands of their books.

But the offer comes with a catch.

If Microsoft digitizes a library’s collection, Web users will only be able to find the books on Microsoft’s search engine. The same goes for Google.

Caught in the Crossfire

Google’s 2005 announcement of partnerships with Harvard, Stanford, Oxford, the University of Michigan and the New York Public Library to digitize millions of books for the world to read was heralded as a watershed moment in the Internet age. The company said it plans to make available more than 15 million volumes over the next decade.

But Google was not alone in its effort for long. Microsoft launched its own book service in 2006. The Microsoft service places digital collections on its Live Search Web site, although the service thus far is much smaller than Google’s service. Among its members are Cornell University, the University of California system and the British Library, where University Librarian Alice Prochaska was once a top official before coming to Yale.

As the two services butted heads, an alternative began to emerge as a way for libraries to sidestep commercial entities altogether. The Open Content Alliance, a nonprofit organization aimed at making collections as accessible as possible, is gaining steam of its own. Backed by the search giant Yahoo, it counts the Smithsonian Institute among its dozens of members. Microsoft initially backed the Open Content Alliance, too, but last year added a restriction against its books’ showing up in searches on competing sites. Google already had such a restriction.

The consortium of 19 libraries that eschewed the corporate digitization efforts — which include Brown, M.I.T., Tufts, Wellesley College, Williams College and several large state institutions — announced in September that its member libraries would join the Open Content Alliance in order to preserve the accessibility of its collections on all search engines.

The project comes as a risk, said Barbara Preece, the executive director of the Boston Library Consortium, the association that represents the libraries. The institutions will incur millions of dollars in scanning costs, she said, all to hold onto the ideal that their collections should be accessible to as many people as possible and not caught in the crossfire of corporate competition.

“We want to ensure that the material is open and available to everybody,” Preece said. “We don’t want to tie the content with any search engine.”

That is not to say Google and Microsoft are negative forces, Preece said — they deserve credit for getting the ball rolling with digitizing materials for the Web.

But the consortium stands on principle.

“We’re going to ensure materials are kept open and freely available,” Preece said. “We’re hoping that others will join us.”

A generous agreement?

Prochaska said in an interview on Thursday that Yale’s agreement with Microsoft was very generous. While the books it scans will be available only on Microsoft’s search engine, the University will receive digital files of all the books that are put online, and the entire digital collection will be linked through the Yale Library Web site and Orbis catalog listings, she said.

“We imagine hundred of thousands of books ultimately being digitized,” Prochaska said. “A proportion of this material will be books that are only available at Yale, so we will be adding significantly to what’s available to the world at large. It’s very exciting. We’re thrilled.”

Prochaska said the University has not been approached by the Open Content Alliance. Its agreement with Microsoft — which approached the University about an alliance — is non-exclusive, she said, so Yale could partner with other digitization services in the future.

Yale’s agreement with Microsoft is subject to a non-disclosure agreement, officials said, and library administrators would not provide details about the financial arrangement between the University and Microsoft. But Google and Microsoft typically subsidize the cost of digitization either in large part or entirely.

“The key fact is that we are obtaining these wonderful digital assets at a fraction of what it would cost us to do that,” Prochaska said.

But with an endowment exceeding $22 billion, the University was pinching pennies needlessly, critics charge.

The costing of scanning Yale’s 100,000 volumes will be in the millions, officials said. The Open Content Alliance estimates the cost of scanning to be 10 cents per page, putting the price of digitizing 100,000 average-length books at somewhere around $2 or $3 million.

In contrast, Yale’s library system has a fiscal year 2008 budget of $89.6 million, according to University budget documents provided to the News. The cost of digitization would amount to less than one-half of one percent of the $615 million the University will spend on capital projects this year.

Doron Weber, a program director for the Alfred P. Sloan Foundation, which has granted money to the Open Content Alliance to aid libraries in digitizing parts of their collections, called for more universities — especially those with resources like Yale’s — to step up to the plate and digitize their own books.

“[Yale has] one of the great collections,” Weber said. “It really should be shared with everyone else. Going with a commercial company is not doing that.”

Weber called the Boston Library Consortium’s decision a “huge step” in encouraging libraries to stand up for the principle of openness.

“It’s important to do it right,” he said.

Yale officials said they expressed displeasure to Microsoft about the company’s search-engine restrictions. If librarians decide some of the University’s treasures are too valuable to be restricted to only one search engine, they could digitize them on their own.

Online titles

Google’s early efforts were mired in litigation because publishers sued to block the company from providing copyrighted material on the Internet. The University will not have to deal with that concern because the 100,000 books it plans to place online were published before 1923 and therefore will not be subject to copyright restrictions, officials said when they announced the Microsoft partnership.

The bulk of the digitized collection will be drawn from the University’s materials on art history, history and religion, said Jennifer Weintraub, the library’s digital collections specialist. The books will be taken mostly from the Seeley G. Mudd Library and the Library Shelving Facility in Hamden, she said.

Those two libraries are not readily accessible for most undergraduates. The soon-to-be-renovated Mudd requires a walk up Science Hill, while books at the LSF must be requested a day in advance so they can be shipped to campus.

That makes books in those facilities prime candidates to be digitized, Weintraub said.

“Many of the librarians know what an incredible collection we have, but it is frustrating when a book is at the shelving facility and students may not want to take the time to request it, [or] it looks old and people won’t be interested,” Weintraub said. “But really, there is amazing stuff in our library.”

Librarians will choose the materials to be digitized merely by wading into the stacks where books of a pertinent topic are located and then beginning at a random letter and seeing what looks appealing, she said.

The University had previously worked with Microsoft’s scanning vendor for its own digitization projects and had proven reliable, officials said. That made Microsoft particularly attractive over its competitors, librarians said, because administrators could feel confident that Yale’s collection would not be damaged or lost during the scanning process.

Although this initiative is Yale’s first major step toward putting full-length books online, the University has a significant collection of texts and images already on the Internet. More than 60,000 documents from the Beinecke Rare Book and Manuscript Library are available online, with thousands more added monthly, according to the Library Web site. More than 150,000 digitized images from the University’s Visual Resources Collection are also available online.

Playing Catch-Up

But some of Yale’s peers already have thousands of books fully digitized. At Harvard, one of Google’s original partners, more than 40,000 books were added online as part of a pilot program with Google in 2005, according to the university.

Harvard should complete its full-scale digitization project sometime in 2008, said Peter Kosewski, the director of publications and communications for the Harvard University Library. By that time, Harvard will have added over one million volumes to the Internet, he said.

Harvard’s collection will not be available on search engines other than Google. But Kosewski said university administrators do not see that as a major problem.

“We don’t feel that our collections have been commercialized,” he said. Putting a million volumes online — volumes that previously were only available to those in Cambridge — is a “great public service,” he said.

In an e-mail, Google spokeswoman Jennifer Parson noted that the company’s agreements with libraries are non-exclusive and that the company’s main goal is to make all of the world’s books discoverable online.

“Digitizing the world’s information is a tremendous undertaking,” Parson said. “We support digitization efforts throughout the world and hope to work with anyone to accomplish our mutual goal of making this information searchable online.”

Representatives of Microsoft declined to be interviewed for this article, but in a news release, the company called the deal with Yale a “significant alliance” and said Microsoft looks forward to working with the University.

“The [Yale] library has a wealth of materials in its general and special collections, and we are delighted to help bring these treasures to the attention of a broader audience,” Danielle Tiedt, a Microsoft general manager, said.

Weintraub said it was a “very unfortunate situation” that Microsoft will limit the digitized collection’s accessibility by only allowing it to be listed on Microsoft’s search engine. But she and Prochaska said the deal was too good for the University to pass up.

“It’s such an expensive proposition to digitize these books,” Weintraub said. “I don’t think that for a project like this, Yale could have done it alone.”

Comments

  • Anonymous

    boo--should've been google!

  • Anonymous

    Boo!!! This is dumb to go with Google over Microsoft as Google clearly has more expertise in this area. Obviously the decision was based on money and not the best way to digitize the library collection.

  • Anonymous

    factually inaccurate statement ^

  • Anonymous

    Microsoft will inevitably make sure that the book searching and reading works best on computers running Windows. This is both unfair to students who don't use Windows, it is foolish given that Microsoft's marketshare at Yale is in pretty obvious decline.