Academic journal article Attention, Perception and Psychophysics

Speech Perception as Categorization

Academic journal article Attention, Perception and Psychophysics

Speech Perception as Categorization

Article excerpt

Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition.

Spoken syllables may persist in the world for mere tenths of a second. Yet, as adult listeners, we are able to gather a great deal of information from these fleeting acoustic signals. We may apprehend the physical location of the speaker, the speaker's gender, regional dialect, age, emotional state, or identity. These spatial and indexical factors are conveyed by the acoustic speech signal in parallel with the linguistic message of the speaker (Abercrombie, 1967). Although these factors are of much interest in their own right, speech perception (SP) most commonly refers to the perceptual mapping from acoustic signal to some linguistic representation, such as phonemes, diphones, syllables, words, and so forth.1

Most of the research in the field of SP has focused on the mapping from the acoustic speech signal to phonemes, the smallest linguistic unit that changes meaning within a particular language (e.g., /r/ and /l/ as in rake vs. lake), with the often implicit assumption that phoneme representations are a necessary step in the comprehension of spoken language. The transformation from acoustics to phonemes occurs so rapidly and automatically that it mostly escapes our notice (Näätänen & Winkler, 1999). Yet this apparent ease masks the complexity of the speech signal and the remarkable challenges inherent in phoneme perception.

As a starting point, one might presume that phoneme perception is accomplished by detecting characteristics in the acoustic signal that correspond to each phoneme or by comparing a phoneme template in memory with segments of the incoming signal. In fact, this was the presumption in the early days of SP, starting in the 1940s (see Liberman, 1996), and it led to the hope that machine speech recognition was on the horizon. However, it became clear rather quickly that SP was not a simple detection or match-to-pattern task (Liberman, Delattre, & Cooper, 1952). Although there has been a wealth of studies documenting the acoustic "cues" that can signal the identity of different phonemes (see Stevens, 2000, for a review), there is significant variability in the relationship of these cues to the intended phonemes of a speaker and the perceived phonemes of a listener. The variability is due to a multitude of sources, including differences in speaker anatomy and physiology (Fant, 1966), differences in speaking rate (Gay, 1978; Miller & Baer, 1983), effects of the surrounding phonetic context (Kent & Minifie, 1977; Öhman, 1966), and effects of the acoustic environment such as noise or reverberation (Houtgast & Steeneken, 1973). The end result of all of these sources of variability is that there appear to be few or no invariant acoustic cues to phoneme identity (Cooper, Delattre, Liberman, Borst, & Gerstman, 1952; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; but see Blumstein & Stevens, 1981, for a possible exception). This means that listeners cannot accomplish SP by simply detecting the presence or absence of cues.

In place of a simple match-to-sample or detection approach, SP is now often conceived of as a complex categorization task accomplished within a highly multidimensional space. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.