Magazine article Online

Evaluating Web Search Results Rankings

Magazine article Online

Evaluating Web Search Results Rankings

Article excerpt

Given that the Internet has been growing at an exponential rate, there Frare probably tens of millions of pages currently online in cyberspace. Yet by its very nature, the Internet lacks any sort of bibliographic control. Searching for a particular Web page without the use of the proper tools can be tedious at best and useless at worst.

For most users, a Web search engine is the only portal used to navigate the vast array of Web sites on the World Wide Web, making it extremely important to critique, evaluate, and compare the various search engines and their mechanisms. Essentially, the effectiveness of a search engine defines the scope of what the user is "allowed" to find. If a search engine is set up poorly, users may never find what they are looking for. The most relevant Web page to the user's query may be buried forever in the depths of cyberspace.


A major problem with search engines is that search queries turn up far too many results, erring on the side of recall rather than precision. The enduser cannot reasonably expect to evaluate and inspect all of these results. If the most relevant Web page to the user is not located near the beginning of the search results, then it may as well be absent from the results.

Since online databases arrived on the research scene some 2 decades ago, search effectiveness comparisons have relied on two main factors: precision and recall. Precision measures how well the search engine lists documents that match the query. The more matching documents found, the higher the precision. Recall measures the ratio of relevant documents retrieved to the total number of relevant documents in a collection. The more documents retrieved, the higher the recall.

This type of measurement poses a problem in Internet evaluations. Due to the sheer magnitude of the World Wide Web, it is impossible to determine the outer bounds of precision and recall. Without bibliographic control, it becomes impossible to determine how many Web sites there are on the Internet or how many are relevant to a query. It thus becomes necessary to find new criteria for measuring and comparing Internet search engines.

A plethora of literature exists that compares the various search engines. The comparisons range from the relevancy of search results, to comparing special features, to comparing the userfriendliness of the interface. Here we explore a different avenue of comparison. How do the results of a search engine evaluation compare to the same evaluation completed more than 3 years ago? How do relevancy ranking comparisons change when comparing current user habits to "ideal" user habits?


We based our experiment on the work of Martin P Courtois and Michael W. Berry ("Results Ranking in Web Search Engines," ONLINE, May/June 1999, pp. 39-46). Courtois and Berry performed comparisons on five popular search engines of the time: AltaVista, Excite, HotBot, Infoseek, and Lycos. They tested the relevancy ranking of 12 multi-term word searches in each of the search engines. The search words used were as follows: credit card fraud, quantity theory of money, liberation tigers, evolutionary psychology, French and Indian war, classical Greek philosophy, Beowulf criticism, abstract expressionism, tilt up concrete, latent semantic indexing, fin synthesis, and pyloric stenosis.

Although research shows that "80 percent of users viewed only the first two pages of results" (Jansen, Bernard J., Amanda Spink, Judy Bateman, and Tefko Saracevic. "Real Life Information Retrieval: a Study of User Queries on the Web." SIGIR Forum 32 No. 1 [1998]: pp. 5-17), Courtois and Berry analyzed the top 100 hits. However, if users look only at the first 20 search results, this analysis could be misleading. To accommodate current user habits and more idealized searching behaviors, our experiment was conducted twice: first extracting only 20 search results, and secondly extracting 100 search results and comparing the relevancy ranking of the two. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.