Magazine article Online

Archiving the Web

Magazine article Online

Archiving the Web

Article excerpt

I hope that you have discovered the Wayback Machine [www.archive.org] by now--that way-cool collection of 30 billion (yes, billion) Web pages dating back to 1996. It's a splendid tool for tracking down material that was removed from a Web site, finding out what a company used to say about itself, and getting a chuckle at the funky-looking Web sites from the early days of HTML coding.

But one of the big limitations of the Wayback Machine has been access points--the only way you could retrieve an archived page was by the URL. If you were tracking down the Web page of a defunct company and you didn't know its Web address, you were out of luck. (OK, nit-pickers--you could see if it was still listed in one of the search engines' indexes, but as the search engines strive to be not only biggest but freshest, they're purging 404 pages faster than ever.)

To the delight of Web researchers, Anna Patterson has developed CobWebSearch [http://recall.archive.org], a search engine for 11 billion of the archive's pages, enabling searchers to retrieve pages with a full-text search of the pages. As of press time, it's still in beta, with a full launch expected in mid-October.

The search functionality is fairly simple. If you type in multiple words in the search box, CobWebSearch first searches for the words as an exact phrase; if there are no results, then an AND operator is applied, and finally, an OR operator. Interestingly, enclosing a phrase in double quotes doesn't force an adjacency relationship; CobWebSearch still tries a phrase search, then AND, then OR. Search results are ranked by calculated relevance, and if you have cookies enabled, the ranking will take into account your interests as evidenced by prior searches.

In an annoying deviation from search engine conventions, CobWebSearch is case-sensitive. The search space tourism turned up 20,207 hits, whereas Space Tourism found 1,279,390 sites. Fortunately, CobWebSearch does suggest capitalization variants in its search results page, but who would remember to click this option?

In what seems like a feature that is appearing in better search engines everywhere, CobWebSearch does some ad hoc categorization of the search results. There is usually a pull-down menu for a People category, although in some of my tests, the entries under People included subject terms. Other categories are created on the fly, based on the content of the retrieved Web pages. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.