Regulating Recognition Decisions through Incremental Reinforcement Learning
Han, Sanghoon, Dobbins, Ian G., Psychonomic Bulletin & Review
Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.
Recognition criteria are hypothetical standards by which memory evidence is categorized as either sufficient or inadequate to warrant a judgment of prior encounter (viz., old ) (Macmillan & Creelman, 1991; see Figure 1). Although most memory researchers assume that criteria are adaptive, there are few models of learning that might support such adaptability (however, see Estes & Maddox, 1995; Unkelbach, 2006), and to date, the vast majority of successful manipulations of memory decision criteria have involved explicit instructions given to observers about the relative preponderance of old and new items (Hirshman, 1998; Rotello, Macmillan, Reeder, & Wong, 2005; Strack & Förster, 1995) or explicit warnings to avoid errors of either omission or commission (Azimian- Faridani & Wilding, 2006). These instructed criterion shifts are sometimes augmented with clear descriptions of monetary losses and gains attached to different response outcomes (payoff matrices; see Van Zandt, 2000), but in all of these cases, observers consciously attempt to comply with instructions, given their understanding of test list regularities or characteristics. What remains unclear is whether the decision criterion can adapt without an explicit or controlled strategy.
One candidate mechanism we propose that might enable adaptive positioning of a criterion is incremental reinforcement learning, which is central for learning category distinctions in other nonrecognition domains (e.g., Gluck & Bower, 1988; Poldrack et al., 2005). Such learning requires integrating trial-by-trial feedback outcomes and gradual remapping of different decisions onto different stimulus feature or feature combinations as a function of probabilistic reward likelihood (for a review, see Ashby & Maddox, 2005). Two category-learning paradigms having this characteristic are information integration and probabilistic classification tasks. During both, the relationship between key stimulus features and appropriate decisions cannot be reduced to a simple explicit, verbalizable strategy, because observers must classify the items on the basis of complex combinations of multiple feature dimensions (e.g., a nonlinear combination of thickness and orientation of sinusoidal gratings) or because feedback is rendered probabilistically, so that making the same judgment for a given repeated stimulus does not guarantee receiving the same feedback outcome on every trial (see also Ashby & O'Brien, 2007). Neuropsychological findings suggest that learning during these tasks relies heavily upon the integrity of the striatum, a basal ganglia structure linked to implicit procedural and habit learning (Knowlton, Mangels, & Squire, 1996; Saint-Cyr, Taylor, & Lang, 1988).
Although feedback-based changes in criteria have frequently been examined in perceptual judgment tasks (e.g., Dorfman & Biderman, 1971; Thomas, 1973), there are fundamental differences between perceptual classification tasks and the regulation of episodic recognition judgments. More specifically, in feedback-based category-learning tasks, it is assumed that the mapping between object features and category decisions is incrementally altered via trial-by-trial feedback learning. …