Published online: 4 January 2013

(©) Psychonomic Society, Inc. 2012

Abstract Three experiments investigated whether extrinsic vowel normalization takes place largely at a categorical or a precategorical level of processing. Traditional vowel normalization effects in categorization were replicated in Experiment 1 : Vowels taken from an [i]-[ε] continuum were more often interpreted as /I/ (which has a low first formant, F^sub 1^) when the vowels were heard in contexts that had a raised F^sub 1^ than when the contexts had a lowered FF^sub 1^. This was established with contexts that consisted of only two syllables. These short contexts were necessary for Experiment 2, a discrimination task that encoiuaged listeners to focus on the perceptual properties of vowels at a precategorical level. Vowel normalization was again found: Ambiguous vowels were more easily discriminated from an endpoint [e] than from an endpoint [i] in a high-F^sub 1^, context, whereas the opposite was true in a low-F^sub 1^, context. Experiment 3 measured discriminability between pairs of steps along the [I]-[ε] continuum. Contextual influences were again found, but without discrimination peaks, contrary to what was predicted from the same participants' categorization behavior. Extrinsic vowel normalization therefore appears to be a process that takes place at least in part at a precategorical processing level.

Keywords Speech perception * Categorization * Psycholinguistics

When listening to speech, listeners are faced with the problem that any particular phoneme is never realized twice in exactly the same way. The production of a speech sound can vary due to factors such as a speaker's gender or accent, or due to the phonetic context in which a phoneme is uttered (Heinz & Stevens, 1961; Hillenbrand, Clark, & Nearey, 2001; Hillenbrand, Getty, Clark, & Wheeler, 1995; Purnell, Idsardi, & Baugh, 1999). The variation that arises as a result of these factors can be so severe that phonemes can overlap with respect to their most important auditory cues (such as the first two formants, I') and F2, in the case of vowels). Under normal listening conditions, however, listeners are hardly bothered by this variation. Part of the solution to this apparent contradiction lies in the fact that speech is perceived relative to general voice characteristics such as a speaker's pitch (R. L. Miller, 1953; Nearey, 1989) and higher formants (Nearey, 1989). The per- ception of vowels is influenced not only by vowel-intrinsic aspects (such as pitch and higher formants) but also by vowel- extrinsic context (Ladefoged & Broadbent, 1957). Listeners interpret vowels relative to voice characteristics that are revealed in a preceding sentence. If the speaker has a relatively high Fi, listeners interpret more ambiguous [i]-[e] sounds as representing III (the vowel with the relatively low /y ), whereas more vowels are interpreted as representing /e/ when the speaker has a generally low F\. This study investigates the cognitive locus of listeners' ability to use extrinsic information to compensate for a speaker's vocal tract characteristics.

Johnson, Strand, and D'Imperio (1999) demonstrated that listeners' categorizations of vowels can also be changed through more abstract knowledge such as perceived gender. They showed that categorization behavior for an F\ [u]-[a] continuum differs depending on whether listeners saw a mov- ie of a male or a female speaker (females generally have higher F\ values than males). Moreover, they also showed that a similar influence can be found when listeners are made to believe that they are listening to a female or a male speaker (through instructions). Such results suggest that listeners per- ceive vowels relative to a cognitive model of the expected vowel space of a particular speaker. Normalization effects would then be the result of a speaker-dependent restructuring of the cognitive vowel space. The idea of such a restructuring of category boundaries contrasts with another proposal about vowel normalization. …

