LINGOES: A Linguistic Ontology Management System

LINGOES: A Linguistic Ontology Management System

Abstract: LINGuistic Ontology managEment System (LINGOES) is a framework to enable linguists to take full advantage of the Semantic Web technologies. Together with OntoGloss, a text annotation tool, and an RDF database with versioning and querying capabilities, it allows a linguist to markup any document with classes in one or more ontologies at the morpheme's level. Textual documents can be in any language as long as they are accessible via a URI (Universal Resource Identifier). The annotated data can be queried across these languages or can be used to annotate other documents. Saving the annotated data in an RDF repository with inference, querying and change management capabilities makes annotations in LINGOES accessible by machines and useful to the wider Semantic Web community.

Categories and Subject Descriptors D.3.2[Language Classifications]: H.2 [Database Management]; H.2.3 [Languages]

General Terms

Ontology, XML

Keywords: Ontology management system, ontology-based annotation tool

1. Introduction

For linguists, marking up a document is a way of preserving its content. This is more urgent in the case of languages that are in the danger of disappearing. Endangered languages can tremendously benefit from an ontology based annotation system. Ontology, as a way of formalizing knowledge, can help linguists to solve the incompatibility of the markup data in a multilingual search.

LINGOES is providing a framework for linguists to capture the knowledge on a specific language and share it with other languages. Its use of RDF (Resource Description Framework) as the main storage and exchange method makes sure that knowledge in the field is saved in a format that is portable to other applications and is readable by machine as well as the human. When a linguist annotates a section, paragraph, word or morphemes of a word with concepts in the ontology, he/she is expressing certain knowledge in the field that is relevant and applicable throughout the field. For linguists it is important to start using RDF instead of XML that do not carry any semantics except with the mutual agreement within a small group of developers. The semantic inherent in the RDF and OWL constructs makes sure that knowledge is transferable throughout a larger audience and with the Web community as a whole.

LINGOES consists of an annotator (OntoGloss), a Change Management module and an RDF repository. There are many text annotators available both as open source and as commercial products [4, 9, 10, 13]. What is different about a linguistic annotator is that words in linguistics are broken up into morphemes. OntoGloss is able to annotate morphemes in a word. For example, if xxxabc is composted of xxx with a suffix -abc, a linguist using OntoGloss is able to annotate each morpheme separately. In the automatic annotation of new documents, when OntoGloss finds yyyabc, it can determine if it has the same suffix [11] and annotate it with the same class in the ontology. Another contribution of LINGOES, that we are not aware of it in any other system, is in managing changes and versioning in the underlying ontologies. Our main contributions are:

* A linguistic ontology-based annotator that annotates a document from the most general level to the morpheme's level

* Change Management that allows versioning in ontologies without rendering affected annotations inaccessible

* Using the linguistic knowledge gathered through annotations by the community to automatically annotate other documents at the morpheme's level

* Using ontologies from different sources to markup documents to make the knowledge gathered in analyzing one language applicable to other languages.

2 LINGOES System Description

Figure 1 shows the architecture of the LINGOES system. It consists of the following modules:

* OntoGloss. OntoGloss is an annotator used in annotating documents using concepts in the ontology. …

