Structured abstract: Introduction: This article reports on a study that explored the benefits and drawbacks of using spatially positioned synthesized speech in auditory interfaces for computer users who are visually impaired (that is, are blind or have low vision). The study was a practical application of such systems--an enhanced word processing application compared to conventional screen-reading software with a braille display. Methods: Two types of user interfaces were compared in two experimental conditions: a JAWS screen reader equipped with an ALVA 544 Satellite braille display and a custom auditory interface based on spatialized speech. Twelve participants were asked to read and process three different text files with each interface and to collect the information about their form and structure. Task-completion times and the correctness of the perceived information on text decorations, text alignment, and table structures were measured. Results: The spatial auditory interface proved to be significantly faster (3 minutes, 12 seconds) than the JAWS screen reader with ALVA braille display (8 minutes, 38 seconds), F(1,70) = 391.523, p < .001, and 15% more accurate when gathering information on text alignment, F(1,70) = 28.220, p < .001. No significant difference between the interfaces could be established when comparing questions on text decorations, F(1,70) = 0.912, p = .343, or table structures, F(1,70) = 1.045, p = .310). Discussion: The findings show that the auditory interface with spatialized speech is more than 160% faster than the tactile interface while remaining equally accurate and effective for gathering information on various properties of text and tables. Implications for practitioners: The spatial location of synthesized speech can be used for the fast presentation of the physical position of texts in a file, their alignment, the dimensions of tables, and the position of specific texts within tables. The quality of spatial sound reproduction can play an important role in the overall performance of such systems.
**********
Today, most computer interfaces are based on visual interaction, requiring the user to be able to see for the interface to be used effectively. Users who are visually impaired (that is, those who are blind or have low vision) compensate for the blocked visual channel by using other senses, such as the auditory channel and the sense of touch. Tactile interfaces, that is, refreshable braille displays, provide an accurate and reliable method of interaction, but are hampered by a lower reading speed. They also require extensive learning and adaptation time to be used effectively. Auditory interfaces, on the other hand, are, in most cases, more intuitive and can be used with much less prior learning, although they still require a certain amount of knowledge of the meaning of different auditory cues or different auditory icons (Sodnik, Dicke, & Tomazic, 2010).
Auditory interfaces can be divided into two major groups: speech- and nonspeech-based interfaces (Brewster, 2002). Speech interfaces are based on human speech that can be recorded and replayed or synthesized by a computer. Since speech is the most common and intuitive way of exchanging information, speech interfaces require a short or no learning period and can be used by almost everyone, provided that the user understands the language used in the interface and is not hampered by any type of hearing impairment (Schmandt, 1994). Nonspeech interfaces are used mostly as an extension of graphical user interfaces (GUIs), visual interfaces that are presented via a computer screen and manipulated by a mouse and a keyboard. In this type of interface, sound is used to inform users about important background processes or programs that are running on their computers and requiring their attention at a certain moment, such as when new e-mail messages arrive, computer viruses are detected, or when batteries on portable machines reach low levels.
There are several types of computer interfaces that are designed especially for users who are visually impaired. In most cases, the auditory and tactile interfaces are merely supplements to the standard GUIs that are intended to present the graphically oriented content with sound or to display it on a braille display. Braille displays are available only from specialty manufacturers and are therefore expensive and not available to all potential users. The tools most commonly used by visually impaired computer users are screen readers. A screen reader scans the content of a GUI and reads the text parts with the use of synthesized speech.
In general, screen readers focus mainly on the text and give almost no information on the physical structure of the document, such as window sizes, text orientation, and style. The latter can in some cases be provided with special keyboard shortcuts at a user's request. For example, a screen reader informs the user of the current location in a table by saying: "column 4 of 5, row 2 of 3."
Crispien, Wurz, and Weber (1994) used spatial sound as an extension of a screen reader. The main part of their work was spatially positioned synthesized speech, which enabled the user to identify the position of the spoken text parts in relation to the visual representation on the screen. Their audio processing was based on a …