Visual Determinants of a Cross-Modal Illusion

Visual Determinants of a Cross-Modal Illusion

Contrary to the predictions of established theory, Schutz and Lipscomb (2007) have shown that visual information can influence the perceived duration of concurrent sounds. In the present study, we deconstruct the visual component of their illusion, showing that (1) cross-modal influence depends on visible cues signaling an impact event (namely, a sudden change of direction concurrent with tone onset) and (2) the illusion is controlled primarily by the duration of post-impact motion. Other aspects of the post-impact motion-distance traveled, velocity, acceleration, and the rate of its change (i.e., its derivative, jerk)-play a minor role, if any. Together, these results demonstrate that visual event duration can influence the perception of auditory event duration, but only when stimulus cues are sufficient to give rise to the perception of a causal cross-modal relationship. This refined understanding of the illusion's visual aspects is helpful in comprehending why it contrasts so markedly with previous research on cross-modal integration, demonstrating that vision does not appreciably influence auditory judgments of event duration (Walker & Scott, 1981).

Schutz and Lipscomb (2007) reported a naturally occurring audio-visual illusion in which visual information changes the perceived duration of simultaneous auditory information. They demonstrated this by showing participants videos of a percussionist striking a marimba with either a long flowing gesture (labeled "long") that covered a large arc or with a short choppy gesture (labeled "short") that rebounded off of the bar and quickly stopped. Although the resultant sounds were acoustically indistinguishable and participants were asked to ignore visual information when judging tone duration, duration ratings were longer when presented with long rather than short gestures.

In light of evidence that vision does not influence auditory judgments of tone duration (Walker & Scott, 1981), this illusion is unexpected. It is an exception to the rule that, with respect to a given task, the modality offering less accurate information does not appreciably influence the modality offering more accurate information. For example, the superior temporal precision of the auditory system generally translates into auditory dominance for temporal tasks such as the judgment of tone duration. Likewise, estimates of flash timings are more affected by temporally offset tones than estimates of tone timings are affected by temporally offset flashes (Fendrich & Corballis, 2001); and auditory flutter rate affects the perception of visual flicker rate, whereas the rate of visible flicker either fails to affect the perceived rate of concurrent auditory flutter (Shipley, 1964) or affects it minimally (Welch, DuttonHurt, & Warren, 1986).

Understanding the Illusion

We believe that the perception of a causal link between auditory and visual information is crucial to explaining why the illusion reported by Schutz and Lipscomb (2007) conflicts so strongly with previous work on sensory integration. However, before presenting evidence in support of this view, we will first discuss two alternative explanations that have been previously dismissed by Schutz and Kubovy (in press). We will close this section by explaining our reasons for proposing that causality plays an important role and by discussing links between this illusion and previous work on the unity assumption.

Post-perceptual processing cannot explain the illusion. As has been shown by Arieh and Marks (2008), certain patterns of cross-modal interactions may be explained by decisional changes, rather than by sensory shifts. Therefore, it is possible that longer gestures could have suggested longer durations, affecting ratings through a top-down process (i.e., a response bias), without any actual perceptual shift. To test this explanation, Schutz and Kubovy (in press) designed a series of experiments manipulating the causal relationship between the auditory and visual components of the stimuli. …

