Google Books as a General Research Collection
Jones, Edgar, Library Resources & Technical Services
The current study attempts to measure the extent to which "full view" volumes contained in Google Books constitute a viable generic research collection for works in the public domain, using as a reference collection the catalog of a major nineteenth-century research library and using as control collections--against which the reference catalog also would be searched the online catalogs of two other major research libraries: one that wets' actively collecting during the same period and one that began actively collecting at a later date. A random sample of 398 entries was drawn from the Catalogue of the Library of the Boston Athenaeum, 1807-1871, and searched against Google Books and the online catalogs' of the two control collections to determine whether Google Books constituted such a viable general research collection.
"There's an east wind coming, Watson."
"I think not, Holmes. It is very warm."
"Good old Watson! You are the one fixed point in a changing age. There's an east wind coming all the same, such a wind as never blew on England yet. It will be cold and bitter, Watson, and a good many of us may wither before its blast. But it's God's own wind none the less, and a cleaner, better, stronger land will lie in the sunshine when the storm has cleared."
--Arthur Conan Doyle, His Last Bow
On December 14, 2004, Google announced that it had concluded agreements with five major research libraries to begin what is now known as the Google Books Library Project. (1) The libraries--the so-called Google 5--were the New York Public Library and the libraries of Harvard, Michigan, Oxford, and Stanford universities. These libraries agreed to let Google digitize volumes from their printed book and serial collections in exchange for institutional copies of the digitized volumes. (2) While the agreements set broad parameters for cooperation, Google gave the libraries sole discretion in determining the volumes to be digitized.
The Library Project--and the discretion given the libraries in determining which volumes would be digitized--raises an interesting question: To what extent is Google creating a research collection? Coyle has suggested that the manner in which collections are being selected for inclusion in the Library Project--many being taken en bloc from low-use remote storage facilities--makes it difficult to characterize Google Books as a "collection" in the accepted sense, though for better or worse "it will become a de facto collection because people will begin using it for research." (3) Is this true? Is this testable? Can sheer volume, in fact, render moot the role of selection in this case? The current study attempted to answer these questions.
While the focus of this study was on content digitized by Google through 2008, one should keep in mind that the volume of available digitized content continues to grow. Since the initial Google 5 cooperative agreements at the end of 2004, Google has entered into agreements with an increasing number of research libraries, both in the United States and abroad, while the European Union has begun funding a digitization program of its own centered on the collections of European cultural heritage institutions (libraries, archives, and museums). (4) Initially, there also was competition from elsewhere in the commercial arena, but this proved to be comparatively short-lived. Within a year of the Google announcement, Microsoft, in cooperation with the Interact Archive, began to digitize print content from several libraries under the rubric of Live Search Books. In May 2008 this effort was abandoned, though content 'already digitized under that program--some 750,000 volumes--remained available via the Internet Archive. (5)
In terms of scope, several of the Library Project partnerships cover both older public domain materials and more recent publications still subject to copyright protection. To this extent they complement Google's partnerships with publishers to provide access to a continuity of content across time periods. …