Academic journal article Literator: Journal of Literary Criticism, comparative linguistics and literary studies

Derivational Relations in English, Czech and Zulu wordnets/Afleidingsverhoudings in Engelse, Tsjeggiese En Zoeloe Woordnette

Academic journal article Literator: Journal of Literary Criticism, comparative linguistics and literary studies

Derivational Relations in English, Czech and Zulu wordnets/Afleidingsverhoudings in Engelse, Tsjeggiese En Zoeloe Woordnette

Article excerpt

Abstract

This article investigates one kind of cross-part-of-speech relation for English, Czech and Zulu lexical resources in the form of semantic networks (wordnets). Many languages have rules whereby new words are derived regularly and productively from existing words via morphological processes. The morphologically unmarked base words and the derived words, which share a semantic core with the base words, can be interlinked and integrated into wordnets, where they typically form "derivational nests" or subnets. Efforts are described to capture the morphological and semantic regularities of derivational processes in English, Czech and Zulu to compare the linguistic mechanisms and to exploit them for suitable computational processing and wordnet construction. While some work has been done for English and Czech already, wordnets for Zulu and other Bantu languages are still in their infancy. This article illustrates how Zulu can benefit from existing work.

Key concepts:

derivational relations lexical resources semantic relations wordnets

Opsomming

In hierdie artikel word een tipe kruis-woordsoortverhouding vir Engelse, Tsjeggiese en Zoeloe leksikale bronne in die vorm van semantiese netwerke (woordnette) ondersoek. Baie tale beskik oor reels waarvolgens nuwe woorde reelmatig en produktief van bestaande woorde via morfologiese prosesse van bestaande woorde afgelei word. Die morfologies ongemarkeerde basiswoorde en die afgeleide woorde, wat 'n semantiese kern met die basiswoorde in gemeen het, kan met woordnette verbind en geintegreer word. Hulle vorm dan tipies "afgeleide neste" of subnette. Ons beskryf pogings om morfologiese en semantiese reelmatighede van afleidingsprosesse in Engels, Tsjeggies en Zoeloe vas te le sodat linguistiese meganismes vergelyk en ontgin kan word vir geskikte rekenaarmatige prosessering en woordnetkonstruksie. Terwyl werk reeds in 'n mate vir Engels en Tsjeggies gedoen is, is woordnette vir Zoeloe en ander Bantoetale nog in die beginstadium van ontwikkeling. In hierdie artikel word aangedui hoe Zoeloe deur bestaande werk kan baat vind.

Kernbegrippe:

afgeleide verhoudings leksikale bronne semantiese verhoudings woordnette

I. Introduction: background and motivation

Arguably the greatest challenge for natural language processing (NLP) is the discrimination of distinct senses associated with one word form. The most frequently used words are also the most polysemous, a great problem for texts that are not restricted to a specific domain like finance or medicine. Many systems rely on lexical resources: traditional dictionaries that have been converted to be machine-readable, or electronic lexicons whose formats may be designed specifically for NLP applications. Word sense disambiguation can be defined as the task of matching a word token in a text with the appropriate sense entry in the lexical resource, which serves as a de-facto standard of the sense inventory of a language.

Like a human user who does not know a word, automatic systems need as much information as possible about the word they are trying to disambiguate in order to distinguish it from similar but inappropriate senses. A good lexicon for NLP therefore connects as many semantically related words to one another as possible, in the form of definitions, example sentences, or semantic pointers. Wordnets are electronic lexical resources that contain all of these, and their appeal for NLP lies in the way they interconnect word forms and senses by means of semantic relations into a giant network. Each word form with a specific meaning occupies a unique position in that network and can be identified by virtue of its particular constellation in relation to other words. Wordnets are most useful when their network is dense, i.e. when a given word is connected to many other words, as more links mean more semantic information and thus better discrimination of individual word senses. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.