The Use of Spatialized Speech in Auditory Interfaces for Computer Users Who Are Visually Impaired

Article excerpt

Structured abstract: Introduction: This article reports on a study that explored the benefits and drawbacks of using spatially positioned synthesized speech in auditory interfaces for computer users who are visually impaired (that is, are blind or have low vision). The study was a practical application of such systems--an enhanced word processing application compared to conventional screen-reading software with a braille display. Methods: Two types of user interfaces were compared in two experimental conditions: a JAWS screen reader equipped with an ALVA 544 Satellite braille display and a custom auditory interface based on spatialized speech. Twelve participants were asked to read and process three different text files with each interface and to collect the information about their form and structure. Task-completion times and the correctness of the perceived information on text decorations, text alignment, and table structures were measured. Results: The spatial auditory interface proved to be significantly faster (3 minutes, 12 seconds) than the JAWS screen reader with ALVA braille display (8 minutes, 38 seconds), F(1,70) = 391.523, p < .001, and 15% more accurate when gathering information on text alignment, F(1,70) = 28.220, p < .001. No significant difference between the interfaces could be established when comparing questions on text decorations, F(1,70) = 0.912, p = .343, or table structures, F(1,70) = 1.045, p = .310). Discussion: The findings show that the auditory interface with spatialized speech is more than 160% faster than the tactile interface while remaining equally accurate and effective for gathering information on various properties of text and tables. Implications for practitioners: The spatial location of synthesized speech can be used for the fast presentation of the physical position of texts in a file, their alignment, the dimensions of tables, and the position of specific texts within tables. The quality of spatial sound reproduction can play an important role in the overall performance of such systems.

**********

Today, most computer interfaces are based on visual interaction, requiring the user to be able to see for the interface to be used effectively. Users who are visually impaired (that is, those who are blind or have low vision) compensate for the blocked visual channel by using other senses, such as the auditory channel and the sense of touch. Tactile interfaces, that is, refreshable braille displays, provide an accurate and reliable method of interaction, but are hampered by a lower reading speed. They also require extensive learning and adaptation time to be used effectively. Auditory interfaces, on the other hand, are, in most cases, more intuitive and can be used with much less prior learning, although they still require a certain amount of knowledge of the meaning of different auditory cues or different auditory icons (Sodnik, Dicke, & Tomazic, 2010).

Auditory interfaces can be divided into two major groups: speech- and nonspeech-based interfaces (Brewster, 2002). Speech interfaces are based on human speech that can be recorded and replayed or synthesized by a computer. Since speech is the most common and intuitive way of exchanging information, speech interfaces require a short or no learning period and can be used by almost everyone, provided that the user understands the language used in the interface and is not hampered by any type of hearing impairment (Schmandt, 1994). Nonspeech interfaces are used mostly as an extension of graphical user interfaces (GUIs), visual interfaces that are presented via a computer screen and manipulated by a mouse and a keyboard. In this type of interface, sound is used to inform users about important background processes or programs that are running on their computers and requiring their attention at a certain moment, such as when new e-mail messages arrive, computer viruses are detected, or when batteries on portable machines reach low levels. …