The objective of this project was to determine the effect of keyword choice and demographic features on Internet searching success through empirical research. An experiment was done with 1,109 learners, spread across forty-six institutes of higher education on three continents. A variety of relationships were then inspected: that between the number of keywords used, age, race, and gender, and the searching-success rate. The results proved that the first three of these have measurable effects on searching success, while gender has only a marginal effect.


The human race has been involved in storing data and information for centuries. The effective retrieval of relevant information in a short time has always been a commonly experienced problem in this area. The purpose of this paper is to report on a literature study and an empirical experiment on retrieval of relevant information from the Internet.

Work on storage and retrieval of information started approximately at the beginning of the third millennium B.C. The Sumerians are credited as being the first people to store and classify written materials into library collections, with the purpose of allowing various social groups to function better. (1) Everyday activities and literature were recorded on clay tablets that were stored in special areas, with only a label bearing the opening words of the document as sole method of indexing. (2) The physical creation of these clay labels could be viewed as one of the first implementations of technology towards establishing indexing. In the absence of any advanced technological tools to make information retrieval possible, these libraries were little more than marked collections of documents.

Indexing and simple classification of manuscripts were done during the Middle Ages. As a result of the coding schemes and alphabetical keys in use, the indexers involved in these tasks were surrounded by an aura of mysticism. (3) Work on cataloging started in the Middle Ages, using written card catalogs and guard books.

The current high-powered computer era, where document matching is made through inverted indexes, string, and positional searches, has provided the much-needed technology to empower the storage and retrieval of information. This increasingly powerful technology has removed economic constraint on the searching mechanism: any characteristic of the document can now be matched to a search query. In fact, there is now no technical constraint to prevent an index from including every single term of a given textual document in the index. An early example of this can be seen in bible concordances such as Strong's Exhaustive Concordance, first published in 1890.

Other Research

A large amount of work has been done on information retrieval in general and on information retrieval from the Internet specifically. Some of these works are compared and reviewed here.

Early Work

The most ideal representation of a document is simply to include it as the index, but the initial absence and later the limitations of technological tools (such as storage space) made this ideal impossible to achieve.

During the late 1950s and 1960s, noted authors in the area of document-content presentation did landmark work. The controversial Uniterm system sparked interest in the United Kingdom and the United States, leading to the Cranfield tests discussed by Cleverdon and Keen, Robertson, and Tonta. (4) In this system, documents were indexed via a single term (hence the name) that was extracted from the document title or abstract. After some structured tests, Uniterm results were compared to those using more traditional indexing methods. The test apparently broke down owing to the disagreement over relevance judgment, and the results were inconclusive. One group of testers claimed that the Uniterm system worked well, while the other claimed the exact opposite. (5)

The actual Cranfield series of tests was done at the College of Aeronautics, Cranfield, UK. …

