Cross-Lingual Name and Subject Access: Mechanisms and Challenges

Article excerpt

This paper considers issues surrounding name and subject access across languages and cultures, particularly mechanisms and knowledge organization tools (e.g., cataloging, metadata) for cross-lingual information access. The author examines current mechanisms for cross-lingual name and subject access and identifies major factors that hinder cross-lingual information access. The author provides examples from the Korean language that demonstrate the problems with cross-language name and subject access.

**********

Today's global information society, benefiting from rapidly advancing communication technologies, spans geographical, lingual, and cultural boundaries. Recognition of the need for knowledge organization and integration, and access to cross-cultural and cross-lingual resources has greatly increased. The 2004 ISKO International Conference on "Knowledge Organization and the Global Information Society" and a 2004 special issue of Cataloging and Classification Quarterly ("Knowledge Organization and Classification in International Information Retrieval") are two examples. (1) International digitization projects have opened access to medieval texts as well as images and primary sources housed in libraries and institutions around the world, greatly advancing global access to multicultural resources.

The technological revolution that brought forth the global information society also has spurred recognition of the necessity for international collaboration aimed at multicultural education and diversity. (2) Linguistic and computational linguistic communities have collaborated in developing multilingual information resource discovery tools, such as concept-based indexing. These are used primarily for cross-lingual information processing. One example is EuroWordNet, which is based on Princeton University's WordNet, a lexical database for the English language. (3) The Open Language Archives Community (OLAC) has also been engaged in archiving, disseminating, and preserving language and cultural resources, including language-engineering tools, through utilization of the Dublin Core metadata standard. (4)

The challenges of accessing resources across cultures and languages suggest this is an area of particular interest to librarians, who are responsible for description and access. As a first step in exploring this topic, the author studied current practices in providing cross-cultural and cross-lingual information access. In this paper, she identifies problem areas and suggests directions for future study. The scope is limited to studies dealing with cataloging and metadata schemes for cross-cultural and cross-lingual information access.

Approaches to Cross-lingual Information Access

The development of cross-lingual thesauri, subject heading lists, and name authorities, as well as the translation of the Dublin Core (DC) metadata scheme into many different languages, is ongoing. In addition to the activities of the DC Metadata Initiative for developing multilingual DC metadata, various approaches to building cross-lingual knowledge organization schemes have been developed with an eye to better access to multicultural and multilingual resources. (5)

Language engineering and linguistics communities have developed lexical tools for cross-lingual resource discovery; these include machine translation, ontology, information extraction, text summarization, and speech processing. Multilingual information resource discovery tools such as concept-based ontology (e.g., EuroWordNet and Global WordNet Association) also have been developed. (6) OLAC has been engaged in archiving, disseminating, and preserving language-culture related resources by developing the OLAC Metadata standard, which defines the format used for the interchange of metadata within the framework of the Open Archives Initiative (OAI). (7) The metadata set is based on the complete set of DC metadata terms, but the format allows for the use of extensions to express community-specific qualifiers. …