Creating a Collaborative, Web-Based Legal Information Archive: With Online Materials Disappearing and Changing Almost Daily, Three Libraries Banded Together to Form a Shared Archive to Preserve and Provide Access to Digital Legal and Public Policy Information

Article excerpt

In 2008, a remarkable milestone was announced on the Google blog: the popular Internet search engine's crawlers had discovered and indexed 1 trillion unique and simultaneously active URLs. To put that figure into context, just 10 years earlier, Google had detected only--only!--26 million Web pages. That's an average of more than 99 billion URLs added to the Web per year, and more than 273 million pages added per day, since 1998.

While Google's milestone gives us an idea of the rapid growth of the Internet, it fails to convey the extent to which URLs have changed or disappeared during that 10-year period. Web-based materials frequently disappear as URLs change and Web sites are updated or deleted. Moreover, file formats and applications evolve rapidly, rendering previous generations of digital content obsolete and unreadable.

Changes to and deletions of Web-based information are especially challenging to the fields of law and public policy. Important legal materials are increasingly being created digitally and distributed online rather than published on paper. This growing body of Web-published legal information includes government documents, federal and state agency publications, commission and task force reports, court opinions, and other judicial publications. Legal scholarship itself increasingly relies on digitally created, Web-based sources. Web-based articles, documents, and comments posted on legal blogs have been cited in prestigious law reviews as well as in briefs submitted to and rulings of state and federal courts, including the U.S. Supreme Court.

Law libraries historically have collected, preserved, and provided access to the printed legal heritage of nations, and recently a group of them began to take responsibility for the digital legal heritage of the United States. The Chesapeake Project was launched by a team of state and university law librarians with the goal of forming a shared legal information archive for preserving and providing permanent access to legal materials published to the World Wide Web. This article explores the Chesapeake Project's background, collaborative structure, archiving tools, collections, copyright policies and evaluation findings, and discusses its future prospects.

Establishing the Project

In 2003, the late director of the Georgetown University Law Library, Robert Oakley, along with a team of key staff librarians, convened a conference titled "Preserving Legal Information for the 21st Century: Toward a National Agenda" at the Georgetown Law Center in Washington, D.C. The goals of this conference were to advance the law library community's dialogue on the topic of preserving legal materials in the digital age and to encourage action to ensure the availability of these materials to future scholars, lawmakers and citizens. The experts attending the conference responded to these calls to action by forming a new organization called the Legal Information Preservation Alliance, or LIPA.

LIPA was created to provide the American law library community with the leadership, guidance, and organizational backing to support the preservation of legal information on a national scale. In 2006, LIPA distributed its strategic plan, which listed the development of a pilot project to preserve digitally created legal information as one of its key organizational goals. To advance this goal, the Georgetown Law Library and the state law libraries of Maryland and Virginia (all members of LIPA) established the Chesapeake Project.

The Chesapeake Project began as a two-year pilot program to preserve legal information published directly to the Web and to test the feasibility of forming a collaborative, nationwide digital preservation initiative within the law library community. In late 2006, the project's three libraries began laying the groundwork by developing a project structure and evaluating Web harvesting systems, digital preservation tools, and digital archive options. …