Magazine article Information Today

Now You See It, Now You Don't. Unless

Magazine article Information Today

Now You See It, Now You Don't. Unless

Article excerpt

Citing a web page as the source for something you know--using a URL as evidence--is ubiquitous. Many people find themselves doing it three or four times before breakfast and five times more before lunch. What happens when your evidence vanishes by dinnertime?

--Jill Lepore, "The Cobweb: Can the Internet Be Archived?" The New Yorker (newyorker.com/magazine/2015/01/26/cobweb)

According to Jill Lepore in the must-read article from which the above quote was taken, the average life of a webpage is 100 days. As she notes, the embarrassing stuff seems to stick around a lot longer, but it's an indisputable fact that web-based content often goes missing: corporate reports, scholarly articles, government documents, working papers, maps, and creative works of all sorts.

Sometimes, an item changes location, and no one does the appropriate old-to-new mapping. If you're lucky, you can find it via a quick Google search on the title. If the entire website has gone away or has been heavily redesigned, you might not be as fortunate. Someone with a limited notion of archival value may have just nuked "all the old stuff."

Sometimes, a document that began its digital life as freely available disappears behind a paywall. Sometimes, an elected official, a celebrity, or another prominent person or authority figure attempts to have potentially embarrassing or controversial items scrubbed from the internet. And sometimes, reports or other documents get revised, and the older versions are no longer available.

Sometimes, a web-based entity goes away entirely, and whatever was hosted there just disappears. Think GeoCities, a free web-hosting service from way back when (the '90s), which was purchased by Yahoo in 1999 and shut down by the company 10 years later. If you've been around the internet a while, you may have had your own page there. By the time the site bit the dust in 2009, it included 38 million pages.

But maybe your ugly, embarrassing () GeoCities page has not completely vanished into the ether. And you have the Internet Archive's Wayback Machine (archive.org/web) to thank/blame for that. During the last few months GeoCities existed, the Internet Archive hoovered up a good bit of the contents via "several deep collection crawls." So for your searching and browsing pleasure, the archive now offers the GeoCities Special Collection 2009, which incorporates GeoCities content that was already archived before the intensive final effort (archive. org/web/geocities. php).

The Internet Archive and its Wayback Machine are pretty much universally loved by information professionals. Outside of our profession, however, I'm not really sure how many people know about it or have used it. Even highly educated folks haven't, such as the colleague who came into my office stressed out because she couldn't find a report that was included in the bibliography of an article she was reading. The citation's URL was a dead link. "I tried searching for it on Google," she said, "but all I could find were a couple more links that didn't work." I asked if she'd tried the Wayback Machine. She looked puzzled and asked what it was.

Teachable Moment

I angled my computer monitor so she could look and went to the Wayback Machine. I cut and pasted her dead URL into the text box and clicked Browse History. In this case, I thought it was probably best to scroll back through the calendar pages to the first highlighted date, when the URL was initially crawled. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.