Magazine article Computers in Libraries

Saving the Web: If We Don't Save Today's Web Now, It Might Not Be There in the Future

Magazine article Computers in Libraries

Saving the Web: If We Don't Save Today's Web Now, It Might Not Be There in the Future

Article excerpt

The web grows, the web breaks, the web goes away. Every day we add more to it. We add more, while we forget about what we added long ago, which is disassembled or crashes and falls away and is forgotten. What's done is done and cannot be undone. It is the way of things.

Leave it to some folks to try to capture what's there before it disappears. Most of you know about the Internet Archive (, which has been crawling and storing huge swaths of the web before it goes away and serving it up for free afterward so we can revisit web years gone by and long-forgotten sites. If you haven't used it lately, go visit it now--really, it's worth a reminder of how useful this service is. Try looking for your own library's site by typing its URL in the Wayback Machine box near the top of the page. Jump around in the timeline, find a date to click on in a bygone year where it has a crawl, wait for it to load (it can be a bit slow, with good reason, which I'll get into in a minute), and reminisce. Remember that redesign project? With the logo over there and the big green ... oh my. Yes, you really did use that as your homepage for a year. If only we knew then what we know now, eh?

Is It Really Saving Everything?

No, it isn't. It's not possible--there's too much of it. But it is crawling, storing, and providing access to a very useful subset of the web as it existed in the past. This is a valuable resource for anyone looking for lost information, organizations, or people who aren't online anymore. It can help you find materials for research into the history of how we communicate or countless other potential thesis topics. But I would like you to remember the key point here--it is not saving everything. After looking up your own organization, look up five others. Think small, like your neighborhood association, PTA, Little League, or a local religious organization. Are those sites on the Internet Archive? Probably some aren't.

Therein lies the opportunity. I think that web archiving is a big potential growth area for library collections. It isn't easy, and it isn't cheap, so it might seem outrageous to suggest it at a time when our budgets are no more stable than they have been for a few years. It's not something every library can or even should take on all at once. But the problem is real: If we don't save today's web now, it might not be there in the future. And if you think about the most compelling collections among your favorite libraries, how many of them are the collections that at first must have seemed like oddball assortments selected by oddball individuals, or the tasteful pluckings of a wealthy aristocrat, or a combination of items that might never have come together if not for some unrepeatable series of events and good (or sometimes, terrible) fortune but now seem like rare gems of foresight and fortune? We build entire institutions around the spun-off sets of resources that as few as one person managed to compile and preserve so well that we now consider these sets priceless.

We're at one of those moments with the web. Much of it is lost, period. The Internet Archive and other major institutions are saving big chunks of it, but you can be sure that many small sites in a corner of the world near you are being lost. Dozens of institutions are building up web archiving programs, but there's still a big opportunity for us to get in the game too. And the good news is that there's a relatively easy way to get started.

Archiving by Subscription

Several years ago, the Internet Archive added Archive-It (, a service that supports individual institutions that want to start archiving the web without having to build up all the necessary infrastructure locally. And there's a lot of infrastructure to build up if you want to do it yourself. Crawling just a single modest-sized website can generate gigabytes of information that have to be stored and indexed before they can be made available, and that's not to mention the software involved in each of these processes. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.