BETWEEN 2006 AND 2008 a video corpus of Sign Language of the Netherlands (Nederlandse Gebarentaal, or NGT) was created with the support of the Netherlands Organisation for Scientific Research (NWO, grant no. 380- 70-008). ' While the original goal of the project was to create a large research database for linguistic investigation irrespective of the researcher's location and institution, early on in the project it was decided to make the data publicly available. Before this time, various parties in the Netherlands, including interpreter trainers, sign language teachers, and interpreters, had expressed considerable interest in such data. Given the absence of written resources for signed languages, the availability of video materials can potentially have a significant impact on deaf communities.
All ninety- two participants who were recorded for the project through December 2008 signed a consent form indicating they agreed to online publication. This article raises several issues relating to "informed consent" as it applies to the publication of sign language data as open content on the Internet. First of all, to what extent are deaf people with varying levels of Dutch literacy aware of the status and impact of a consent form? Although the statements on the form were explained to them in sign language, one may wonder to what extent this counts as a voluntary and well-informed decision. Second, one may wonder whether it is possible to agree to the online publication of such recordings given the rapid technological developments that we have seen in the last decade. Just as few people would have foreseen the significance of sharing social data in applications like Facebook and Google Earth, we cannot predict the impact of new technologies. Will face recognition on the basis of movies be built in to every operating system in ten years' time? These are new types of considerations that all touch upon the "well-informed decision" that is inherent in informed consent. This article describes some current developments in this area on the Internet. The next two sections focus on new licenses to protect the use of data, and the section that follows them addresses the central question of the value of informed consent in the publication of sign language corpora.
Technological Advances in the Study of Signed Languages
The linguistic study of spoken languages has long been restricted to the analysis of written resources. For centuries, grammars and dictionaries have been based on written rather than spoken language. Text documents have been increasingly available and accessible, and the first computer technologies in the 1960s and 1970s were able to process only text, not audio or video recordings. In fact, it took quite some time before corpus linguistics as a separate branch of linguistics arose. Aside from technological impediments, it was not until Labov's observations of variation in speech in the 1960s that the study of the use of language (rather than the knowledge or structure of language) became an independent area of study. The rise of a separate discipline of corpus linguistics, which uses larger collections of texts as data, followed in the 1970s. The development of language technologies such as automatic speech recognition, which facilitated the study of speech corpora, did not take off until less than twenty years ago; this new phase enabled the study of speech behavior in addition to written texts, thereby allowing for insights into speech and everyday spoken interactions, which typically constitute more of spontaneous language behavior than writing. With the rapid rise of Internet use and the mass publication of text on web pages, text corpora have become more prominent in linguistics as they constitute a rich source of information on everyday language use now available in huge quantities online.
By contrast, there is a dearth of resources of any kind for signed languages. Written materials have never really played a role in any deaf community. …