Academic journal article Literator: Journal of Literary Criticism, comparative linguistics and literary studies

An Overview of the EtsaTrans Machine Translation System: Compilation of an Administrative Domain/'n Oorsig Van Die EtsaTrans-Masjienvertalingstelsel: Die Samestelling Van 'N Administratiewe Domein

Academic journal article Literator: Journal of Literary Criticism, comparative linguistics and literary studies

An Overview of the EtsaTrans Machine Translation System: Compilation of an Administrative Domain/'n Oorsig Van Die EtsaTrans-Masjienvertalingstelsel: Die Samestelling Van 'N Administratiewe Domein

Article excerpt

Abstract

The EtsaTrans machine translation system has been in development at the University of the Free State for the last four years and is currently the only machine translation system being developed in South Africa for specialised and non-general translation needs. The purpose of this exposition is to present the program through its phases of development, and to report on current levels of performance. We analyse the output, the size of the database, and then propose the future implementation of a part of speech tagger and word stemmer into the program to improve its linguistic performance. Our goal with the system is not to translate all types of document, but to work in a specialised domain that will allow the system to translate documents that are repetitive in nature. This will enable translators to spend more time on non-repetitive subject matter. By capturing the nature of the language of such repetitive documents in the database, we are able to create a standardised language usage for the specialised domain.

Key concepts:

administrative translation domain specific machine translation

Opsomming

Die EtsaTrans-masjienvertalingstelsel word die afgelope vier jaar reeds aan die Universiteit van die Vrystaat ontwikkel. Dit is tans die enigste masjienvertalingstelsel in Suid-Afrika wat vir gespesialiseerde (nie-algemene) vertalingsdoeleindes ontwikkel word. In hierdie uiteensetting word die program na gelang van sy ontwikkelingsfases beskryf en word daar oor die huidige verrigtingsvlakke verslag gegee. Ons kyk na die uitsette, databasisgrootte en die toekomstige inkorporering van 'n woordsoortetiketteerder en woordstamherkenner om die program se linguistiese werkverrigting te verbeter. Ons doel is nie om aile tipes tekste te kan vertaal nie, maar wel om in 'n gespesialiseerde domein te werk wat die stelsel in staat sal stel om dokumente van 'n repeterende aard te vertaaL Dit sal vertalers vrystel om tyd aan minder repeterende tekste te wy. Deur die aard van die taalgebruik in sulke repeterende dokumente in die databasis vas te vang, is ons in staat om 'n gestandaardiseerde taalgebruik vir die gespesialiseerde domein te skep.

Kernbegrippe:

administratiewe vertaling domeinspesifiek masjienvertaling

I. Historical background

The University of the Free State took over the rights of the LEXICA system from the company EPI-USE Systems in 2000. LEXICA was a transfer system that was used to do morphological, syntactic, semantic and contextual analyses and could be used for the following language pairs: Afrikaans, Setswana, Swahili and Portuguese to English; and English to isiXhosa, isiZulu and Afrikaans. The development of EPI-USE's LEXICA system began in 1990 and continued until the beginning of 2003. An evaluation done on the system showed that continuing with the development of a purely rule-based machine translation (RBMT) system would be futile in terms of the latest developments within machine translation (see Snyman & Naude, 2003). Sumita and lida (1999) state that conventional machine translation systems use rules as knowledge, and that it is difficult to build a practical system because of the problem of building such a large-scale rule-base. They also mention the difficulties involved in improving translation performance because the effect of adding a new rule is hard to anticipate, and because translating using a large-scale rule-based system is time-consuming. The dictionaries that were developed were too broad to be of any use in a domain-specific field. Tests showed that although LEXICA could translate a document well enough to convey meaning, the results were not syntactically satisfactory (Snyman & Naude, 2003). Since the addition of new rules and data to the system did not cause any significant improvement, new avenues of development needed to be explored within the latest developments in machine translation (MT).

The discontinuation of the RBMT system excluded pure RBMT from being considered as a possible MT paradigm. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.