Similarity, Distance, and Categorization: A Discussion of Smith's (2006) Warning about "Colliding Parameters"

Article excerpt

The idea that categorization decisions rely on subjective impressions of similarities between stimuli has been prevalent in much of the literature over the past 30 years and has led to the development of a large number of models that apply some kind of decision rule to similarity measures. A recent article by Smith (2006) has argued that these similarity-choice models of categorization have a substantial design flaw, in which the similarity and the choice components effectively cancel one another out. As a consequence of this cancellation, it is claimed, the relationship between distance and category membership probabilities is linear in these models. In this article, I discuss these claims and show mathematically that in those cases in which it is sensible to discuss the relationship between category distance and category membership at all, the function relating the two is approximately logistic. Empirical data are used to show that a logistic function can be observed in appropriate contexts.

If every stimulus in our world were perceived as an entirely unique object, people would be inundated with an immense amount of pointless information. So we organize objects into categories, allowing us to describe the world in a simpler manner and to generalize better to novel situations. Not surprisingly, then, understanding the nature of human concepts and the way in which they shape our categorization behavior has remained one of the central topics in cognitive psychology. Ever since the decline of the classical view of concepts, which assumed that a category could be held together via a collection of features both necessary and sufficient to determine category membership, one of the key ideas in psychological theories has been that a category can be held together by a loose family resemblance among objects. According to this resemblance view of categorization (Rosch, 1978), it is the similarities between things that govern the extent to which people judge an item to belong to a category.

When formalized as cognitive models (e.g., Medin & Schaffer, 1978; Nosofsky, 1984), similarity-based theories rely on two key assumptions. First, they assume that the subjective sense of similarity between items decreases very rapidly (exponentially, in fact) as the items are made more distant in some suitable sense. second, in order to account for behavior in forced choice tasks, the models incorporate some kind of choice rule. In a recent article, Smith (2006) has argued that these similarity-choice models for categorization suffer from a major design flaw, resulting from a complex interaction between these two assumptions. In effect, he argued that similarity and choice "cancel," leaving a simple linear function relating category distance to category membership probabilities. The implication of the claim is that we might benefit by discarding the framework provided by similarity-based models of categorization and replacing them with models that predict category membership by applying linear functions to distances. Citing work by Roberts and Pashler (2000), Smith emphasized the importance of thinking about more than a model's data fit and the value of examining the internal structure of cognitive models, since it is just such an analysis that uncovers the cancellation effect.

In terms of the general suggestion that modelers need to be aware of complex interactions between parameters that can arise in some cases, it is difficult to disagree with Smith (2006). Indeed, in recent years, an extensive literature has built up regarding how to measure model performance in an appropriate way (e.g., Balasubramanian, 1997; Myung & Pitt, 1997; Navarro, Pitt, & Myung, 2004; Pitt, Myung, & Zhang, 2002) and how to understand the characteristic patterns that a model can produce (e.g., Myung, Kim, & Pitt, 2000; Pitt, Kim, Navarro, & Myung, 2006). Moreover, these methods have frequently been applied to the understanding of categorization models, including RULEX (Navarro, 2005), the generalized context model (Navarro, 2007), and ALCOVE (Pitt et al. …