Searching the World-Wide Web: Lycos, WebCrawler and More

Article excerpt

As in the print publishing world, the development of finding-aids and indexes must wait for the development of the resources. When anonymous FTP resources multiplied, archie appeared. With the growth of gophers, veronica was born. The explosive growth of World-Wide Web resources in the past year has inspired several contenders for the title of best Web search engine." The different keyword indexes of Web resources feature a wide variety of search interfaces and capabilities. No clear winner has emerged yet, and the diversity of search engines and databases provides the information professional with multiple choices.

There are many Web keyword indexes, but the best-known are:

* Lycos * WebCrawler * World-Wide Web Worm * Harvest Broker * CUI

Just as World-Wide Web clients can speak other protocols and connect to gopher, telnet and FTP resources, some Web indexes include more than just Web documents. Some of these search engines permit Boolean searches and other sophisticated search options, but all suffer from the problem of overload.

SYSTEM OVERLOAD

A major problem inherent with successful Internet keyword indexes is that as soon as a particular search tool becomes useful and well-known, it is flooded with users. This in turn makes it less dependable, since the original server is unable to handle the increased load. This happened with the first archie server at McGill University and then with the first veronica server. For both archie and veronica, a partial solution has been to divide the load by multiplying the servers. Many archie servers on different continents now handle the thousands of daily archie searches. The dispersion of veronica servers has occured along similar lines. This has been an effective but only partially successful way of dividing the load. As more servers are being set up by generous hosts on the Net, Internet use is multiplying. The result is that even with a dozen or more veronica servers, the load (determined by the number of simultaneous search requests) is still too high. It is not uncommon to try an archie or veronica search and get a failed search response due to high system load.

The same situation occurs with Web finding-aids. When a particular index establishes a reputation for successful searches, it attracts a huge increase in traffic. Then users can no longer depend on that resource and must look for an alternative. Most search options for the Web have not yet resulted in a multiplication of servers, but that time may soon arrive. Meanwhile, the different indexes provide alternatives when a particular favorite is unavailable or unbearably slow.

LYCOS

Lycos, a project hosted by the computer science department at Carnegie-Mellon University, is one of the best-known and most popular indexing tools for the World-Wide Web. When Netscape Navigator was first widely released in late 1994, the people at Netscape Communications Corporation wisely set up a page that listed various Internet search tools (http://www.netscape.com/home/ internet-search.html). In one quick and dirty comparison, they ranked them based on the results from a simple search on surf. Lycos retrieved the most documents and therefore was the first of the listed Internet search tools. Due to its prominence on the Netscape Internet Search page, Lycos' load has increased so greatly that it can be difficult to get any response at all.

Although the Lycos database is one of the largest finding-tools, there are other reasons that Lycos searches result in a high number of hits. A single-word search on Lycos defaults to automatic truncation, so the search on surf also retrieves documents with surface. On multipleword searches, Lycos defaults to an OR operation. Although the search results are ranked and give preference to records that have all the search terms, this results in many irrelevant records.

In the Lycos technical documentation, the developers say, "We plan to upgrade the search engine's language at some future point to implement more standard Boolean operators. …