Magazine article Online

Finding and Using the Magic Words: Keywords, Thesauri, and Free Text Search

Magazine article Online

Finding and Using the Magic Words: Keywords, Thesauri, and Free Text Search

Article excerpt

A major characteristic that distinguishes information professionals from casual Web searchers is our determination to find the best, most appropriate search terms. We sometimes construct elaborate search strategies to obtain highly relevant, timely, and accurate information. We don't put a couple of words in a search box and trust that the top 10 results will fulfill our information need. In fact, information professionals consider the "magic words" of controlled vocabularies to be our ace in the hole. We rely on these to find information that our clients can't, and we revel in their astonished looks when we succeed where they failed.

We understand the arcane language of thesauri, taxonomies, and ontologies. We recognize the limitations and the advantages of free text searching and know when to combine those with thesauri terms. Who else, when perusing a library catalog, would automatically search for cookery rather than cook books? The Library of Congress Subject Headings have come in for their share of ridicule--and many of the truly strange headings have been rectified. My first professional job was as a cataloger for a major multinational commercial bank, and I still remember hating the subject heading interest and usury. It's not there anymore, since cooler heads realized the terms are not synonymous. (EBSCO's Business Source Premier has a "see" reference from interest & usury to interest.) In 1998, it was big news that LC changed moving-pictures to motion pictures. But today's researcher on the industry is much more likely to use films or film industry as a search term. LC semi-accommodates this, having added terms such as feature films and adventure films to its vocabulary.

Perhaps a library catalog search using the prescribed term motion pictures will return highly relevant books. But take that to the Web and things change. It's common language, not controlled vocabulary, that rules the day. Statistically, there is probably much more on Web pages that utilize some form of the word film that will return relevant results. Sundance is a "film festival" not a "motion picture festival"--although ABI/INFORM uses motion picture festivals as its thesaurus term. Trying to fit old words to new concepts is like trying to squeeze a size 8 foot into a size 6 shoe.


In business literature, terminology changes in an almost faddish fashion at times. A thesaurus becomes a moving target (not to be confused with a moving picture). New technologies affecting industries are hard to pin down, terminologically speaking, when they first appear. What, for example, to make of YouTube? Suppose your research project was determining corporate use of YouTube, perhaps precipitated by JetBlue's CEO filming an apology after the airline stranded passengers last February, which the company uploaded to YouTube. Given the unique product name, you don't need controlled vocabulary to search for YouTube; you can simply search the name.

The good researcher will recognize that competitors to YouTube exist and should be added to the strategy for your research project to obtain comprehensive retrieval. To do this, you either OR together the competitor names (assuming you can determine them) or look for a controlled vocabulary term. At LC's Web site (, there's one book with YouTube in the title that's been assigned subject headings. Actually, it's only been assigned one, Internet videos. Search that term in the subject heading field, and only the one book appears. This tautology is reminiscent of the distinction early databases made between descriptors and identifiers. The former were controlled vocabulary; the latter were uncontrolled vocabulary. Identifiers, in my mind, were a precursor to the current tagging phenomenon.

Blogs are another example of potential perils in choosing thesauri terms for new technologies. When they first appeared, only a few years ago, the common name was "Weblogs. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.