We introduce the newly developed Formosan Language Archive and show how it has been built up through examples drawn from Rukai, a Formosan language spoken across southern Taiwan and including six main dialects (Mantauran, Maga, Tona, Budai, Labuan, and Tanan) that exhibit great variation. After displaying the layout of the archive, we explain how texts and sound files are recorded and digitalized, how words are tagged, and what the purpose of the search system is. Last, we compare the Formosan language archive to other well-established language archives and show how and why we have adopted a layout that is somewhat different but enables us to capture, through thorough linguistic analysis, the variations displayed in the Rukai dialects (and in Formosan languages in general).
1. INTRODUCTION. Out of the 24 or so Formosan languages known to have been spoken up to the twentieth century in Taiwan (Keta[n]galan, Taokas, Papora, Babuza, Favorlang, Hoanya, Siraya, Makattao, Taivoan, Kavalan, Pazeh, Thao, Atayal, Saisiyat, Bunun, Tsou, Saaroa, Kanakanavu, Rukai, Paiwan, Puyuma, Amis, Seediq, Yami), nearly half (the first nine mentioned) are already extinct, and the others are declining rapidly. The possible reasons for language death in Taiwan are diverse but to some extent interrelated: early sinicization of the plain tribes, loss of the languages as a legitimate means of daily communication under a fifty-year governmental policy imposing Mandarin Chinese as the only official language, the passing away of elderly speakers in linguistically still-extant communities, and emigration of younger villagers to neighboring towns.
The Formosan languages exhibit great variation that is still not well understood. Until the mid 1990s, their research was rather neglected. Preliminary studies were made during the Japanese occupation. These laid the foundations for more detailed descriptions. They were followed by a series of descriptions on the synchronic and diachronic phonologies of the Formosan languages as well as discussions of their genetic classification. In the past few years, a renewed surge of interest has caused an influx of studies that have been carried out within different theoretical orientations. However, in this community-shared attempt to salvage the cultures and languages of the Formosan tribes, we are faced with two major contradictions: first, data collection remains a lone enterprise, whose results are usually not shared among the linguistic community. What is published is the product of fieldwork, that is, linguistic descriptions and analyses. Second, due to practical reasons such as time constraints, difficulty in ac cessing the material at hand, pressure from academic institutions to publish theoretically relevant analyses rather than text collections and other such materials, linguists working on the Formosan languages do not usually transcribe texts but content themselves with recording unrelated sets of sentences. As a result, very few text collections have been published for the Formosan languages. (2)
The Formosan Language Archive has been developed within Academia Sinica under the auspices of the National Science Council. One of its purposes is to collect, conserve, edit, and disseminate via the worldwide web a virtual library of language and linguistic resources permitting access to recorded and transcribed Formosan text collections. A pilot study was conducted in 2001. From 2002 this project has been granted national status and the first project span is five years. It is hoped that by 2006, text archiving of at least nine out of the fifteen extant Formosan languages (including Rukai, Yami, Saisiyat, Tsou, Atayal, Bunun, Paiwan, Amis, and Puyuma) will have been carried out with the help of linguists, engineers, and speakers of the languages themselves.
This paper presents an overview of this newly developed archive and illustrates its build-up through examples drawn from Rukai. …