Magazine article Online

How to Use Controlled Vocabularies More Effectively in Online Searching

Magazine article Online

How to Use Controlled Vocabularies More Effectively in Online Searching

Article excerpt

INTRODUCTION

We have long recognized that powerful retrieval in online searching can be gained through the combined use of natural language and controlled vocabularies. The idea of a "controlled vocabulary," however, does not represent a single theory or approach to indexing or classification. There are actually many types of controlled vocabularies in databases. Often a single database will contain several types. Effective use of these vocabularies requires a strategic understanding of which types of classification and indexing are involved, and taking advantage of the particular mix of vocabularies in a given database to achieve optimum retrieval.

Most database controlled vocabularies date from the days when the databases were print products only, and represent a variety of theories on indexing and classification. In this article I will first describe and explain seven common types of subject vocabularies in databases. Secondly, 1will describe some specific search techniques for taking advantage of the strengths of particular types of vocabularies. Finally, I will suggest an overall strategy for identifying and using vocabulary types when approaching a new database.

SPECTRUM OF APPROACHES TO SUBJECT DESCRIPTION

Seven Types of Subject Description. Figure I lists seven major types of subject description that can be found in online and CD-ROM databases today. For convenience of discussion, these types are arrayed from broad to specific, that is, typical terms or categories in types that appear higher in the list are broader, more general, than typical terms or categories in types that are lower in the list. For example, a typical category in the NTIS classification, a hierarchical classification, may be an entire academic discipline, such as geography or civil engineering, while a typical descriptor in a science thesaurus might be a particular metal alloy or a chemical compound.

Note that there exists a very large number of types of controlled vocabularies: in fact, almost every specific vocabulary has some unique features. With some particular vocabularies, therefore, the order of types in Figure 1 would be different. I am grouping and generalizing types for the sake of simplifying our discussion of what is, in fact, a very complex topic. Furthermore, some people would be uncomfortable with the idea of calling classification categories "vocabulary." We will use that term here to promote our understanding of the full array of subject approaches to information in databases.

Controlled Vocabulary and Natural Language. The top six types of vocabulary in the list can all be considered forms of controlled vocabulary, while the last, naturallanguage, is uncontrolled. For our purposes here, "natural language" refers to the text in the record as prepared by the original writer of the document or abstract. Such text follows the rules only of ordinary, "natural," speaking and writing; it is not an artificial language designed just for information retrieval.

"Controlled vocabulary" refers to index terms or classification codes that have been created to provide consistent and orderly description of the contents of documents or records. Such vocabulary may be "controlled" in one or more of several ways:

* by limiting many of the normal linguistic variations in natural language (regulating whether terms appear in singular or plural, permitting only certain verb endings, etc.)

* by regulating the word order and structure of phrases, and

* by cutting down the number of synonyms or near-synonyms so that only one way of describing a given topic is allowed in the vocabulary.

In addition, aids to indexers and searchers may be provided in the form of crossreferences between terms and scope notes defining terms closely. (Still other features, to be described below, can be found in classifications.) Typically, with a controlled vocabulary, a list or thesaurus of allowable terms or categories is developed, and both indexers and searchers consult that list when using controlled vocabulary. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.