Digitizing Dissertations for an Institutional Repository: A Process and Cost Analysis*

Article excerpt

Objective: This paper describes the Lamar Soutter Library's process and costs associated with digitizing 300 doctoral dissertations for a newly implemented institutional repository at the University of Massachusetts Medical School.

Methodology: Project tasks included identifying metadata elements, obtaining and tracking permissions, converting the dissertations to an electronic format, and coordinating workflow between library departments. Each dissertation was scanned, reviewed for quality control, enhanced with a table of contents, processed through an optical character recognition function, and added to the institutional repository.

Results: Three hundred and twenty dissertations were digitized and added to the repository for a cost of $23,562, or $0.28 per page. Seventy-four percent of the authors who were contacted (n=282) granted permission to digitize their dissertations. Processing time per title was 170 minutes, for a total processing time of 906 hours. In the first 17 months, full-text dissertations in the collection were downloaded 17,555 times.

Conclusion: Locally digitizing dissertations or other scholarly works for inclusion in institutional repositories can be cost effective, especially if small, defined projects are chosen. A successful project serves as an excellent recruitment strategy for the institutional repository and helps libraries build new relationships. Challenges include workflow, cost, policy development, and copyright permissions.


Digitization projects in libraries seem ubiquitous as libraries become increasingly involved in the acquisition, development, and management of digital information [I]. Libraries typically target archival and special collections materials such as historical documents and photographs [2]. Projects to digitize vast collections of books began as early as 1971 with Project Gutenberg and are now getting widespread media attention with the launch of Google Book Search, the Internet Archive, and others [3]. In an April 2007 list of ten assumptions about the future that would significantly impact academic libraries and librarians, the Association of College & Research Libraries Research Committee placed digitization at the top of the list, stating, "There will be an increased emphasis on digitizing collections, preserving digital archives, and improving methods of data storage and retrieval" [4].

A related emergent trend in academic libraries is the implementation of institutional repositories (IRs), digital collections that capture and preserve the intellectual output of university communities [5]. A search of OpenDOAR, the Directory of Open Access Repositories, lists 298 academic repositories in North America [6]. Health sciences libraries are among those contributing to this trend; of 125 libraries that responded to a 2006 supplementary survey for the Annual Statistics of Medical School Libraries in the United States and Canada, 28 have established IRs and 70 are planning to add or are considering offering a repository [7]. According to Foster and Gibbons, libraries build IRs because they "provide an institution with a mechanism to showcase its scholarly output, centralize and introduce efficiencies to the stewardship of digital documents of value, and respond proactively to the escalating crisis in scholarly communication" [8].

Medical librarians are just beginning to report their experiences with institutional repositories in the professional literature [9-13]. In one case study, Krevit and Crays [13] describe challenges that the Texas Medical Center experienced in piloting a multiinstitutional repository, including copyright concerns and lack of faculty participation. An analysis by Singarella and Schoening [14] of the surveys conducted between 2005 and 2007 by the Association of Academic Health Sciences Libraries and a survey conducted in 2006 by the Association of Research Libraries [15] confirmed that the challenges experienced at the Texas Medical Center were not unique. …