Academic and research libraries are well-versed in collecting material from the print world. The present and future collections that are being produced on the Web require urgent attention to acquire, preserve, and provide access to them for future research. Many of the skills that librarians have honed through years of collecting in the print-based world are applicable to digital collection development, but will require ramping up technical skills and actively embracing digital content in current and future collection-development work. This paper reports on an exploratory project that aims to apply existing skills and knowledge to collect materials from the Internet and lay the groundwork for collection development in the future.
In the print world, the acquisition and selection of materials for libraries is a well-defined and well-known system, developed over decades of work in the profession. The bibliographic output is generally controlled, and librarians can rely on their agents or vendors to obtain the books and journals that are required. This system of identifying and procuring known items also translates well into the controlled digital domain of electronic resources--databases, e-books, and e-journals. Likewise, archivists have developed a refined way of identifying and acquiring specialized collections of letters and diaries, memorabilia, and primary literature that form the basis for social and historic research.
A significant and growing shadow world of material of equal importance is exploding on the Internet and now deserves attention. This fugitive literature contains important manifestations of present day social and political history, art, and literature, and primary cultural output. In every way, this literature is contemporary primary source material upon which research in the future will rely. Its existence begs the question of how subject specialists and collection development librarians take the selection and procurement skills already mastered and refine or expand them to address the new and growing population of material on the Web.
The research presented here reflects efforts to understand the challenges of collecting from the Web. The questions this project sought to answer are
* How can we discover and locate this material?
* How can we associate it with known published material (either in print or electronically) where it might enrich an existing collection?
* How can we modify and transfer the bibliographic principles already existing in the profession to the work of gathering more transitory documents from the Web?
The issues of long-term archiving creating a potentially massive collection, and the provision of adequate metadata to provide access, are corollary questions of equal significance, but are not the primary focus of this research.
A review of literature in collection development includes the standard collection development texts that detail how items are identified, selected, obtained, and processed (cataloged). Bonk and Magrill's Building Library Collections, Gorman's Collection Management for the 21st Century, and Johnson's Fundamentals of Collection Development and Management provide the rubric for acquisition and collection-development activities in most libraries. (1) This historic professional framework enabled a subject-based approach, matching the goals of this project to the standards in our profession. While this traditional library literature helped set the stage, the literature of archives, especially recent research with archiving Web documents, helped us understand current efforts to capture collections on the Web. While not yet a widely embraced area of research, some seminal writings have been produced. Pearce-Moses and Kaczmarek examine the challenges of a state library managing its mandate to collect and provide access to official reports and documents. …