Academic journal article Journal of Visual Impairment & Blindness

Perception of Synthetic and Natural Speech by Adults with Visual Impairments

Academic journal article Journal of Visual Impairment & Blindness

Perception of Synthetic and Natural Speech by Adults with Visual Impairments

Article excerpt

Abstract: This study investigated the intelligibility and comprehensibility of natural speech in comparison to synthetic speech. The results demonstrate the type of errors; the relationship between intelligibility and comprehensibility; and the correlation between intelligibility and comprehensibility and key factors, such as the frequency of use of text-to-speech systems.

**********

Recent technological progress in electronic augmentative communication devices has expanded the possibilities for communication for people who previously faced severe communication difficulties, such as individuals with visual impairments. As Mirenda and Beukelman (1990) noted, the most common method used to generate synthetic speech in modern communication devices is text-to-speech (TTS) systems. TTS systems use a flexible mathematic algorithm that represents rules for combining acoustic properties and rules for pronunciation (Mirenda & Beukelman, 1990).

The perception of synthetic speech is usually discussed with regard to intelligibility and comprehension (Koul, 2003). Intelligibility is the listener's ability to recognize phonemes and words when they are presented in isolation (Ralston, Pisoni, & Mullennix, 1989), whereas comprehension involves the extraction of the underlying meaning from the acoustic signals of speech (Duffy & Pisoni, 1992). The comprehension of synthetic speech involves recognizing the stimuli presented and then performing higher-level processing to obtain meaning. The term discourse comprehension is an automatic process that is used to encode and integrate passages, explanations, and conversations (Higginbotham, Drazek, Kowarsky, Scally, & Segal, 1994).

Despite the substantial data on the intelligibility and comprehension of synthetic speech systems by people with no disabilities (see, for example, Koul, 2003; Koul & Hanners, 1997; Mirenda & Beukelman, 1987, 1990), there has been limited research on the intelligibility and comprehension of synthetic speech systems by people with visual impairments (see, for example, Hensil & Whittaker, 2000). Numerous studies have found that natural speech is significantly more intelligible than that produced by TTS synthesis systems (Clark, 1983; Greene, Logan, & Pisoni, 1986; Hoover, Reichle, VanTasell, & Cole, 1987; Kangas & Allen, 1990; Koul & Allen, 1993; Logan, Greene, & Pisoni, 1989; Mirenda & Beukelman, 1987, 1990; Mitchell & Atkins, 1989; Ralston, Pisoni, Lively, Greene, & Mullennix, 1991).

For example, the percentage of intelligibility for high-quality synthesizers, such as DECtalk (a high-quality form of synthetic speech manufactured by Digital Equipment Corporation) in a single-word intelligibility task ranged from 81.7% correct with an open-response format (Mirenda & Beukelman, 1987) to 96.7% correct with a closed-response format (Greene, Manous, & Pisoni, 1984, cited in Koul & Hanners, 1997). In contrast, word-intelligibility scores for natural speech ranged from 97.2% correct with an open-response format to 99% correct with a closed-response format (Logan et al., 1989).

Moreover, a review of relevant research on the perception of sentences produced by synthetic speech revealed a differentiated pattern of results, depending on the type of sentence spoken. According to Mirenda and Beukelman (1987), accuracy scores ranged from 96.7% for sentences presented via the DECtalk synthesizer to 99.3% for meaningful sentences presented via natural speech. However, for anomalous sentences, accuracy scores ranged from 78.7% for synthetic speech to 97.7% for natural speech (Pisoni & Hunnicutt, 1980, cited in Koul, 2003). Thus, similar to the trend for words, there is remarkably greater intelligibility for natural speech than for TTS speech for sentences.

Furthermore, comprehension of sentences and narratives has been found to be slower and less accurate when materials are presented in synthetic rather than in natural speech (Higginbotham et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.