In this paper we examine the translation processes and performance of 14 Danish MA translation and interpreting (T&I) students at Copenhagen Business School (CBS), who produced translations into English (their L2) under different working conditions: written translation, sight translation and sight translation using a speech recognition program, i.e. software which automatically converts spoken output into written text (see Jurafsky and Martin, 2000, pp. 235-284 for an introduction to SR technology). On the basis of analyses of task times, translation quality and pronunciation challenges, we discuss the benefits and drawbacks of using SR and provide suggestions for improved interaction with the system.
The research questions which will be addressed are the following:
1) What are the task times in the three modalities? Specifically, are there any time savings in sight translation with SR (henceforth SR translation) compared with written translation? Normally, one would expect spoken translation (including SR translation) to be a good deal faster than written translation, but both the fact that students were unfamiliar with the SR software and the fact that they were dictating in their L2 might result in a larger number of errors having to be corrected, and therefore make this modality more time-consuming.
2) Is there any difference in the quality of translation in the three modalities? A small-scale previous study (see 2.1 below) showed significant time savings in oral compared with written translation without output quality being noticeably affected (Dragsted and Hansen, 2009). Since translators using SR--like translators working in the written modality--have a written representation on the screen, this might lead to better quality in the final SR output than in traditional sight translation output.
3) What type of misrecognitions occur when students sight translate with SR? How many are caused by students' erroneous pronunciations? Are there any other factors which may result in misidentifications? It should be remembered that the students were working in their L2, and even though Danes find it easier to pronounce English correctly than do students from many other countries, it is nevertheless likely that problems will occur.
2.1 A three-stage project
The present study reports on the initial experiments of the third stage of a larger project which investigates the coordination of comprehension and text production processes in translation, interpreting, and T&I hybrids, and the potential for convergence between the written and oral modalities of translation. The scheme was originally motivated by a desire to discover if there are advantages to be gained from encouraging students to draw on oral strategies when they produce written translations.
As teachers we have often had the experience that students produce better translations if they trust their first intuition to a greater extent, and think in terms of processing meaning rather than individual words. When writing translations, many learners appear to fall into the trap of endlessly seeking to optimise the text, and rephrasing sentences over and over again, the result of which is all too often a not particularly coherent or natural text. We therefore decided to introduce SR as a means of simulating an interpreting situation where both the source and target texts were visible. A further motivation for employing SR was that language technology increasingly dominates professional translators' lives, thereby making it ever more essential that students are familiarised with the various tools of the trade. Apart from an introduction to translation memory systems and terminological data bases, the CBS T&I curriculum at present does not include any language technology tools.
The project was planned so as to consist of the following three steps:
1) A pilot study (reported in Dragsted and Hansen, 2007) and a more detailed comparative study of written and sight translation (Dragsted and Hansen, 2009), both drawing on experimental data combining keystroke logging, eye-tracking and quality ratings of the spoken and written output. …