Academic journal article Perception and Psychophysics

Visual Contribution to the Multistable Perception of Speech

Academic journal article Perception and Psychophysics

Visual Contribution to the Multistable Perception of Speech

Article excerpt

The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.

The human ability to follow speech gestures through the visual modality can be considered a core component of speech perception. Behavioral studies have demonstrated diat concordant visual information improves speech intelligibility in noisy conditions (e.g., Benoit, Mohamadi, & Kandel, 1994; MacLeod & Summerfield, 1987; Robert-Ribes, Schwartz, Lallouache, & Escudier, 1998; Sumby & Pollack, 1934) and that speech gestures can be partly followed when audition is lacking (Bernstein, Demorest, & Tucker, 1998). Furthermore, even with perfect audio input, speechreading may improve speech intelligibility (Davis & Kim, 1998; Reisberg, McLean, & Goldfield, 1987). On the contrary, seeing incongruent articulatory gestures can alter the perception of clear auditory input For example, in the famous McGurk effect (McGurk & MacDonald, 1976), a visual /ga/ dubbed witii an audio /ba/ is often perceived as /da/ or /tha/, whereas a visual /ba/ dubbed with an audio /ga/ is often perceived as /bga/ (for a review of experimental replications or refinements of the McGurk effect, see Green, 1998).

A number of theoretical models have been proposed concerning the possible cognitive basis of these findings (e.g., Bernstein, Auer, & Moore, 2004; Massaro, 1987, 1989; Summerfield, 1987). In their reanalysis of the five architectures proposed by Summerfield (1987), Schwartz, Robert-Ribes, and Escudier (1998) suggested that the "motor recoding model" was the most plausible with regard to experimental findings. In this model, audiovisual interactions take place in a common representation space in which the sensory inputs are related to compatible articulatory gestures. Though this architecture differs from the simple equivalence between percepts and gestures posited in the motor theories (e.g., Fowler & Rosenblum, 1991; Liberman & Mattingly, 1985: see Schwartz, Abry, Boe, & Cathiard, 2002, and Schwartz, Boe, & Abry, 2006, for reviews of similarities and differences), it considers perceptuomotor interactions as playing a key part in multisensory speech perception. Consistent with this view, it has been shown that seeing oneself articulating in a mirror improved identification of concordant acoustic syllables and deteriorated identification of discordant ones (Sams, Mottonen, & Sihvonen, 2005). Neurophysiologies data has provided further evidence for such perceptuomotor interactions. Indeed, brain areas involved in me planning and execution of speech gestures (notably the left inferior frontal gyrus, me premotor and/or the primary motor cortex) have been found to be activated during audiovisual speech perception (e. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.