An Analysis of Internet Search Engines: Assessment of over 200 Search Queries

Article excerpt

Veteran users of the Internet will undoubtedly recall that not long ago they needed to know one or two sets of arcane commands to navigate cyberspace in order to locate a few pockets of potentially useful information. Archie and his pals Veronica and Jughead were the buzzwords that made the task of unearthing data somewhat more tolerable. With the evolution of cyberspace, a new generation of tools designed to navigate the World Wide Web has transformed the task into a game.

Figure 1 illustrates a site that links users to numerous new-generation World Wide Web search tools, which are variously called "WWW databases," search engines," topically organized "directories," "Internet indexes," "searchable subject trees," or meta-search engines." These resources all. to one degree or another, allow the user to retrieve information on the World Wide Web based on the searcher's query entry. They all do this, however, with varying degrees of relevancy and efficiency.

The documents and information available on the Web are vast. The approaches to organizing them for quick, useful access have been piece-meal. Regularly appearing columns in Computers in Libraries, Information Today, Online, ACRL News, and other library and computing journals, such as the new and noteworthy Cyberskeptic's Guide to Internet Research newsletter, only focus on a few sites at a time. The Rolodex of resources these print reviews generate can quickly become obsolete, regardless of their quality.[1]

Research Objective

Requests for information from users cover a wide range of topics. Because there is the probability of finding answers and useful grey literature on the World Wide Web, librarians and information specialists must exploit the available search tools to pinpoint the location of useful sites. According to Richard Scoville, the most recent count of these search engines puts the number at over 60[2]. As frequent users already know, and as infrequent users quickly discover, these engines often metamorphose to offer new options, new limits, and different features. Considering the variability of retrieval results among search engines, this study was undertaken to quantify accurate matches as compared to matches of arguable quality for 200 subjects relevant to undergraduate curricula. Both evaluative (engines that provide ratings for Web sites) and nonevaluative search tools were selected for investigation. The disparity in the average number of relevant matches is reported. Thirty of the 200 search queries, together with the performance of each search engine, are presented in the chart on page 61. (The full list is available at the Central Connecticut State University Library Web site at http: //neal.ctstateu.edu:2001/htdocs/web search.html.) Bear in mind, however, that retrieval is subject to change without notice. Searches done on the same day can have different yields, this is the dynamic nature of the World Wide Web.

Literature Review

Several papers have appeared in the computing and library literature addressing the usefulness of certain WWW search tools. Many are expository; they describe the search engines, features. Others have attempted to evaluate selected search engines on the basis of the retrieval yielded by various searches. Although most provide good descriptions of the engines under investigation, all fall short of executing a significant number of searches in order to conclude which engine is the most accurate or efficient. Neil Randall executed a "fistful of queries."[4] Martin Courtois and colleagues employed a creative and valid approach: Using only three sample questions, the authors identified benchmark Web resources they expected the engines to return in a results list.[5] Stacey Kimmel, in a review of robot-generated searches, tried only two terms: ebola and pollution.[6]

The Web itself is a good source of documents illustrating engine comparisons. Again, unfortunately, query sample size is always deficient. …