There is a small but growing literature on the perception of natural acoustic events, but few attempts have been made to investigate complex sounds not systematically controlled within a laboratory setting. The present study investigates listeners' ability to make judgments about the posture (upright-stooped) of the walker who generated acoustic stimuli contrasted on each trial. We use a comprehensive three-stage approach to event perception, in which we develop a solid understanding of the source event and its sound properties, as well as the relationships between these two event stages. Developing this understanding helps both to identify the limitations of common statistical procedures and to develop effective new procedures for investigating not only the two information stages above, but also the decision strategies employed by listeners in making source judgments from sound. The result is a comprehensive, ultimately logical, but not necessarily expected picture of both the source-sound-perception loop and the utility of alternative research tools.
Humans exhibit extensive abilities to use sounds to identify and monitor events in their environment; these abilities are often unappreciated and, with the exception of speech, largely unexplored (see, e.g., Handel, 1989, 1995; McAdams, 1984, 1993). As examples of these abilities, auto mechanics, physicians, and even plumbers use sound to detect possible abnormalities and make preliminary diagnoses of probable cause (e.g., Jenkins, 1985). Similar diagnostic listening strategies for human gait can be effective in podiatric medicine (J. Wernick, personal communication, May 2004). Thus, although a major long-term goal of hearing research is to understand typical auditory perception, the majority of research has investigated only the perception of simple, easily specified, laboratory-created stimuli. The resulting body of fundamental research has provided important, detailed specifications of the basic functioning of the human auditory system (e.g., Hartmann, 1998; Moore, 2003; Yost, 1994), but this scientific knowledge often seems to have limited direct relevance for understanding the ability to recognize the nature of complex natural acoustic source events. Although this situation could arise because the existing principles represent too low a conceptual level (or level of organization) to be relevant, it is at least equally likely that we lack sufficient understanding of natural source-event perception to see the relevance. The present study is designed from this perspective.
Speech is the one example of a class of complex natural source events that has been studied extensively in terms of the relationship between the properties of production (the source event) and the sounds produced, as well as in terms of the links between possible invariant acoustic cues and perceptual categories. As a source event, speech is complex and highly variable, yet it is characterized by relatively simple, discrete feature categories. If simple source features mapped in a relatively straightforward manner to sound, the basis for listener perception of the speech features would have been easy to identify. One hallmark of more than half a century of speech research, however, has been the failure to identify invariant acoustic cues for features of production (see, e.g., Raphael, 2005). Rejecting the possibility of a complex mapping between source and sound, speech perception is often assumed to involve some form of a highly specialized, closed module (e.g., Liberman & Mattingly, 1985), possibly one initially shaped by early experience (e.g., Werker & Tees, 1999). If valid, this supposition of a unique speech perception mechanism would mean that the body of research on speech would not be relevant to understanding the perception of other types of natural acoustic events.
One alternative perspective, based upon Gibson's ecological conceptualization (Gibson, 1966,1979), is that the dynamic structure not only of speech, but of all natural acoustic events, determines the structure of the sounds produced, which in turn allows direct perception of the source event (see, e. …