Academic journal article Information Technology and Libraries

Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights

Academic journal article Information Technology and Libraries

Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights

Article excerpt

Subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. With more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. Using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. We argue that the weighting of subject terms is more important than ever in today's world of growing collections, more federated searching, and expansion of social tagging. Such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas.


Subjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. This approach to indexing is implicitly assumed in various guidelines for subject indexing. For example, the Dublin Core Metadata Element Set recommends the use of controlled vocabulary to represent subject in "keywords, key phrases, or classification codes." (1) Similarly, the Library of Congress practice, suggested in the Subject Headings Manual, is to assign "one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics." (2) A topic is only "important enough" to be given a subject heading if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are "critical to the subject of the work as a whole." (3) Although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice.

A similar practice applies in non-textual object subject indexing. Because of the difficulty of selecting words to represent visual/aural symbolism, subject indexing for art and cultural objects is usually guided by Panofsky's three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by Layne in "ofness" and "aboutness" in each level. Specifically, what can be indexed includes the "ofness" (what the picture depicts) as well as some "aboutness" (what is expressed in the picture) in both pre-iconographical and iconographical levels. (4) In practice, VRA Core 4.0 for example defines subject subelements as:

   Terms or phrases that describe,
   identify, or interpret the Work
   or Image and what it depicts or
   expresses. These may include
   generic terms that describe the
   work and the elements that it
   comprises, terms that identify
   particular people, geographic
   places, narrative and iconographic
   themes, or terms that
   refer to broader concepts or
   interpretations. (5)

Here again, no weighting or differentiating mechanism is included in describing the multiple elements. What is addressed is the "what" problem: What is the work of or about? Metadata schemas for images and art works such as VRA Core and CDWA focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. However, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword.

Recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. This introduces more subject terms into the system. Yet again, there is typically no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency information in the search interfaces. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.