Magazine article Online

Maximizing Relevant Retrieval

Magazine article Online

Maximizing Relevant Retrieval

Article excerpt

The common wisdom among search professionals is that Boolean searching is the most practical approach to locating relevant information from bibliographic databases. But is it?

Today's information professionals are using natural language searching with Web search engines, such as AltaVista and Excite, and to a lesser extent with TARGET (Dialog) and FreeStyle (LEXIS-NEXIS). Natural language search systems identify relevant retrieval differently, and those who configure the natural language search software reveal very little about its process. As Barbara Quint said in a 1994 article in Wilson Library Bulletin, natural language systems use a "complex series of algorithms to analyze statistical counts of terms (the number af terms in each document, frequency of terms in the document compared to frequency of terms in the database, etc.). The more often a concept appears in a document, the greater weight it is given." She also said "Documents with phrases where query words occur closer together are ranked more highly than documents in which the query words are scattered. These statistical techniques are in turn used to drag the most relevant references to the top of the list" I Ii.

In an ONLINE article in May 1994, Carol Tenopir and Pam Cahn suggested using relevance searching in situations where the searcher is only looking for "a few high relevant items" or when "concepts are of unequal weight" [2]. According to Susan Feldman in another ONLINE article, "Good relevance ranking systems return what you ask for, as well as what you almost asked for, and sometimes what you didn't ask for, but wish you had" [3]. Natural language searching is good for vague or broad questions. The searcher must be willing to tolerate less relevant and even unrelated items in the retrieved set.


Relevance ranking/natural language searching and Boolean searching are not mutually exclusive. Natural language is easier for end-users to use and it can outperform Boolean "by expanding the recall/precision envelope" [3]. And, while Boolean searches are precise, natural language searches are comprehensive.

To achieve the best overall results, sometimes even professional searchers need to employ both natural language searching and Boolean searching, according to the results of our test searches run recently in EBSCOhost.


To test the difference in retrieval for each type of searching, we ran 100 searches using the EBSCOhost fulltext Academic Elite database. Each search was run twice-once in the Boolean mode and then again in the natural language search mode. We printed and compared the first ten hits retrieved in each mode, evaluating accuracy, relevance, and overlapping citations.

The topics for the 100 queries were taken from newspaper headlines, Reader's Guide to Periodical Literature subject headings, and questions asked at our library reference desk.

EBSCOhost's Academic Search Full-Text Elite has indexing and abstracting for 3,100 journals, including 1,000 full-text. It can run in two modes: Keyword Search (allows Boolean operators) and Natural Language Search. EBSCOhost defines its natural language system as follows: "allows the user to query the database using words, phrases or even complete sentences. The results of a query are presented in ranked order with the most relevant article being presented first. A result can be found even if the record does not contain all of the words from the query. The more words that appear in an article, the more relevant the record is and the closer to the top of the Result List it will appear."

The librarians formulated the appropriate Boolean query or proximity search for each topic. An example of a Boolean search is (Jones OR Lewinsky) AND CLinton. A proximity search could be phrased as gettysburg address. The searchers inspected the first ten retrieved citations from each search. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.