Generalization and Similarity in Exemplar Models of Categorization: Insights from Machine Learning

Article excerpt

Exemplar theories of categorization depend on similarity for explaining subjects' ability to generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction mechanisms and thus, seemingly, of generalization ability. Here, we use insights from machine learning to demonstrate that exemplar models can actually generalize very well. Kernel methods in machine learning are akin to exemplar models and are very successful in real-world applications. Their generalization performance depends crucially on the chosen similarity measure. Although similarity plays an important role in describing generalization behavior, it is not the only factor that controls generalization performance. In machine learning, kernel methods are often combined with regularization techniques in order to ensure good generalization. These same techniques are easily incorporated in exemplar models. We show that the generalized context model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical model called kernel logistic regression. We argue that generalization is central to the enterprise of understanding categorization behavior, and we suggest some ways in which insights from machine learning can offer guidance.

Intuitive definitions of categorization tend to invoke similarity, in that objects that are similar are grouped together in categories. Within a category, similarity is very high, whereas between categories, similarity is low. Similarity is at the heart of many categorization models. Prototype theories postulate that categorization depends on the similarity of stimuli to an abstracted idea (Posner & Keele, 1968; Reed, 1972), and exemplar theories calculate similarity to memory representations of previously encountered stimuli (Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1986). A potential problem for these models is that they put the burden of explanation onto the intuitive concept of similarity. Despite serious problems hi defining similarity (Medin, Goldstone, & Gentner, 1993), models of categorization continue to rely on similarity.

The appeal of invoking similarity in categorization models stems from the need to generalize. Given a stimulus that has never been encountered before, how can it be categorized correctly on the basis of limited experience with previous stimuli? An easy answer seems to be that a new stimulus is simply categorized in the same way as similar stimuli have been before. Correct generalization to new stimuli thus depends crucially on choosing the right similarity measure. Shepard (1987) famously turned this reasoning around and used generalization to measure similarity. He also tried to deduce a similarity measure such that generalization performance likely would be good (Chater & Vitanyi, 2003; Shepard, 1987; Tenenbaum & Griffiths, 2001).

In Shepard's work, the idea of a perceptual space has played a major role. The similarity measure he suggested, often called Shepard's universal law of generalization (or simply Shepard's law), operates on a mental representation assumed to be a metric space. Shepard's work on generalization and similarity (e.g., Shepard, 1957, 1987) cannot be separated from his work on categorization (e.g., Shepard & Chang, 1963; Shepard, Hovland, & Jenkins, 1961) and multidimensional scaling (MDS; e.g., Shepard, 1962). Since this work, it has become common for perceptual categorization models to assume a perceptual space and to use Shepard's law as a similarity measure in this space (Kruschke, 1992; Love, Medin, & Gureckis, 2004; Nosofsky, 1986). Exemplar models in particular strongly rely on Shepard's work. These models are very similar to a class of popular tools in machine learning and statistics: kernel methods. This observation was first made by Ashby and Alfonso-Reese (1995). Here, we draw parallels between recent progress in kernel methods and exemplar theories of categorization. …