From Disaster Recovery to Digital Preservation: Libraries Face Enormous Challenges in Finding Ways to Preserve Their Collections as They Move More Deeply into the Digital Arena

Article excerpt

Over the course of my career, I've seen libraries increasingly involved in digital content. Today, hardly any aspect of what libraries do remains untouched. Collections have shifted from physical to digital forms, though not necessarily uniformly. This shift has reshaped many aspects of how libraries operate, with profound implications not only for how they provide access to materials but, especially, in how these digital collections will be preserved for future generations. Libraries face enormous challenges in finding ways to preserve their collections as they move more deeply into the digital arena.

Disaster Recovery Versus Digital Preservation

The fragility of digital content cannot be understated. Lots of things can go wrong with great potential for catastrophic loss. Hardware will eventually fail, taking active copies of data with it. Software can malfunction in ways that can corrupt files. Malware can invade computer systems in ways that not only disrupt access but might also destroy data. In this day of activist hackers, any organization--even a library--can suddenly find itself the victim of intense and sophisticated attack.

Any organization with operations that depend on computer systems and their associated data will naturally implement procedures for disaster recovery. These procedures ensure that the organization can quickly recover from any sort of problem with its technical infrastructure, including restoration of any lost data. Organizations with great operational dependence on their computer systems, such as hospitals, financial institutions, or global internet-based businesses, will have one or more standby systems that can instantly take over should the primary systems fail. Organizations with globally distributed infrastructure, such as Google, have architectures in place with massive redundancy designed to work around failures of any given hardware or software component. Libraries and other organizations with more limited resources tend not to have this level of failover redundancy but rather focus on keeping up-to-date backups that can be restored once equipment has been repaired following a failure. Disaster recovery involves the ability to maintain the continuity of the organization, focusing on the restoring of data in its current state.

Digital preservation builds on a base level of disaster recovery, extending the scope of concern into the distant future. Digital preservation goes beyond addressing problems with restoring data to its current state to creating processes and infrastructure capable of carrying data forward hundreds of years, assuming that any formats, media, and equipment in place today will be obsolete and unsupported. The challenges of this long-term objective include not only creating highly resilient storage architectures but also maintaining metadata to support the re-creation of the content in future formats. Digital preservation includes an organizational strategic commitment to the forward migration of data through the inevitable cycles of technology. While disaster recovery ensures that a given organization can deal with any given failure, digital preservation also addresses the possibility of widespread and enduring failures, including disasters with extensive geographic reach and disruptions in communications that might endure for days, weeks, or years. The Open Archival Information System (OAIS) reference model provides some guidance for the design of repositories for digital preservation.

The relative strategic positions of the commercial sector and libraries come into play when considering disaster recovery versus digital preservation. In a business environment where strategies focus on the shorter term, society cannot necessarily count on publishers and other content providers to invest in strategies beyond those related to disaster recovery and business continuity. While they certainly have interests in longer-term preservation of their content, it is much more likely that forward migration through formats and other key processes will be deferred until times of more immediate need. …