Psychoacoustic Abilities as Predictors of Vocal Emotion Recognition

Psychoacoustic Abilities as Predictors of Vocal Emotion Recognition

Published online: 27 July 2013

Abstract Prosodic attributes of speech, such as intonation, influence our ability to recognize, comprehend, and produce affect, as well as semantic and pragmatic meaning, in vocal utterances. The present study examines associations between auditory perceptual abilities and the perception of prosody, both pragmatic and affective. This association has not been previously examined. Ninety-seven participants (49 female and 48 male participants) with normal hearing thresholds took part in two experiments, involving both prosody recognition and psychoacoustic tasks. The prosody recognition tasks included a vocal emotion recognition task and a focus perception task requiring recognition of an accented word in a spoken sentence. The psychoacoustic tasks included a task requiring pitch discrimination and three tasks also requiring pitch direction (i.e., high/low, rising/falling, changing/steady pitch). Results demonstrate that psychoacoustic thresholds can predict 31% and 38% of affective and pragmatic prosody recognition scores, respectively. Psychoacoustic tasks requiring pitch direction recognition were the only significant predictors of prosody recognition scores. These findings contribute to a better understanding of the mechanisms underlying prosody recognition and may have an impact on the assessment and rehabilitation of individuals suffering from deficient prosodic perception.

Keywords Psychoacoustics * Music cognition * Sound recognition * Audition


When we speak, we can express pragmatic and emotional meaning, not only through words, but also by changing certain attributes of our voice, such as fundamental frequency (/0), perceived as pitch, intensity (perceived as loudness), and duration. An all-encompassing term for such acoustic attributes of speech is prosody. Indeed, identical phrases may convey completely different pragmatic or emotional information, depending on their prosody. For example, an utterance conveying a joyful feeling is likely to be loud and display a relatively large /D range, whereas a sad utterance would tend to be softer with a smaller /0 range (Sobin & Alpert, 1999).

Prosody recognition has been described as a multistep process, involving both sensory and cognitive mechanisms. Auditory perceptual mechanisms perform an initial acoustic analysis of the speech signal, followed by higher cognitive mechanisms, which derive pragmatic and emotional meaning from the acoustic components, employing preexisting socio-emotional scripts (Schirmer & Kotz, 2006). Studies investigating these cognitive mechanisms have associated vocal emotion recognition with the ability to comprehend others' mental and emotional states, commonly referred to as theory of mind (Kleinman, Marciano, & Ault, 2001; Rutherford, Baron Cohen, & Wheelwright, 2002) or emotional intelligence (Trimmer & Cuddy, 2008). These studies have viewed vocal emotion recognition as part of a more general emotion recognition mechanism, guiding the attention and perception of emotional cues through the different sensory channels (Adolphs, 2003).

A relatively limited number of studies have highlighted the importance of auditory perceptual abilities to the perception of prosody. These studies have focused mainly on populations representing the extreme ends of auditory capacities. For example, research conducted with hearing-impaired individuals showed that intonation, stress, and emphasis in voice are difficult to perceive by many individuals with severe hearing loss (Gold, 1987), including individuals with cochlear implants (Most & Peled, 2007). Other studies demonstrated that musicians, who represent better than average auditory capabilities, exhibit enhanced performance in identifying emotion in vocal utterances (Thompson, Schellenberg, & Husain, 2004).

The results of the above studies, focusing on unique populations, support an association between auditory abilities and the ability to perceive prosodic cues in speech. …

