An Evolutive Process to Convert Glossaries into Ontologies

This paper describes a method to generate ontologies from glossaries of terms. The proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). These products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. Although this method has been applied to produce an ontology from the "IEEE Standard Glossary of Software Engineering Terminology," it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the Semantic Web.


From the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include definitions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relationships with other more complex relationships between concepts to completely represent a certain knowledge domain. (1) Ontologies belong at this last level.

According to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. There are a variety of comparative studies of these tools, (2) as well as varying proposals for systematically generating ontologies from lower-level knowledge representation systems, especially from descriptor thesauri. (3)

This paper proposes a process for generating a terminological ontology from a dictionary of a specific knowledge domain. (4) Given the definition offered by Neches et al. ("an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary") (5) it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch.

If the developed ontology is based exclusively on the dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. It would be what is normally called a "lightweight" ontology, (6) which could later be converted into a "heavyweight" ontology by implementing, in the form of axioms, knowledge not contained in the dictionary. This paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the IEEE Standard Glossary of Software Engineering Terminology. (7)

Ontologies, the Semantic Web, and Libraries

Within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. This may be observed particularly within the realm of digital libraries, although, as Krause asserts, objections to their use have often been raised by the digital library community. (8) One of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. Nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents.

The term ontology (used in philosophy to refer to the "theory about existence") has been adopted by the artificial intelligence research community to define a categorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. …

