Academic journal article Psychonomic Bulletin & Review

Learned Face-Voice Pairings Facilitate Visual Search

Academic journal article Psychonomic Bulletin & Review

Learned Face-Voice Pairings Facilitate Visual Search

Article excerpt

Published online: 15 July 2014

# Psychonomic Society, Inc. 2014

Abstract Voices provide a rich source of information that is important for identifying individuals and for social interaction. During search for a face in a crowd, voices often accompany visual information, and they facilitate localization of the sought-after individual. However, it is unclear whether this facilitation occurs primarily because the voice cues the location of the face or because it also increases the salience of the associated face. Here we demonstrate that a voice that provides no location information nonetheless facilitates visual search for an associated face. We trained novel face-voice associations and verified learning using a two-alternative forced choice task in which participants had to correctly match a presented voice to the associated face. Following training, participants searched for a previously learned target face among other faces while hearing one of the following sounds (localized at the center of the display): a congruent learned voice, an incongruent but familiar voice, an unlearned and unfamiliar voice, or a time-reversed voice. Only the congruent learned voice speeded visual search for the associated face. This result suggests that voices facilitate the visual detection of associated faces, potentially by increasing their visual salience, and that the underlying crossmodal associations can be established through brief training.

Keywords Crossmodal integration . Visual search . Face perception . Spatial attention

Searching for a specific face is a common experience, from finding a friend to identifying a security threat. Research investigating visual search for faces with a specific identity has generally used visual stimuli (e.g., Kuehn & Jolicoeur, 1994; Tong & Nakayama, 1999). However, in real-world search for a specific face, voices frequently accompany faces and carry information about the speaker's location, identity, and emotion.

Understanding of how crossmodal information, such as faces and voices, is integrated in the brain continues to evolve. Evidence for crossmodal convergence in the parietal (e.g., Schroeder & Foxe, 2002) and temporal (Benevento, Fallon, Davis, & Rezak, 1977; Schroeder & Foxe, 2002) lobes suggests that information converges in multimodal association areas only after processing in unimodal areas. However, recent studies have suggested that crossmodal signals influence processing in primary sensory areas of the brain (e.g., Ghazanfar & Schroeder, 2006; Giard & Peronnet, 1999; Molholm et al., 2002; Schroeder & Foxe, 2005; Shams, Iwaki, Chawla, & Bhattacharya, 2005). For example, Watkins and colleagues have utilized the sound-induced flash illusion to demonstrate auditory modulation of primary visual cortex (Watkins, Shams, Josephs, & Rees, 2007; Watkins, Shams, Tanaka, Haynes, & Rees, 2006).

Studies exploring the consequences of auditory-visual interactions have found that sounds facilitate recognition and search when the sound and visual object are spatially coincident (Bolognini, Frassinetti, Serino, & Làdavas, 2005;Driver &Spence,1998; Frassinetti, Bolognini, & Làdavas, 2002; Stein, Meredith, Huneycutt, & McDade, 1989). Similarly, sounds facilitate visual processing when beeps and flashes are temporally coincident (e.g., Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008).

Beyond spatial and temporal crossmodal correspondences, experience- or identity-based crossmodal correspondences also influence perceptual processing. These correspondences rely on the co-occurrence of features and/or congruence in identity across stimulus modalities, and they influence perceptual processing. For example, congruent audiovisual pairs (e.g., a picture of a dog and a barking sound) elicit faster object recognition response times (RTs) than in the identification of unimodal stimuli (Hein et al., 2007; Molholm, Ritter, Javitt, & Foxe, 2004), as well as increased negativity in the N1 ERP component (Molholm et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.