The present study examined basic assumptions embedded in learning models for predicting behavior in decisions based on experience. In such decisions, the probabilities and payoffs are initially unknown and are learned from repeated choice with payoff feedback. We examined combinations of two rules for updating past experience with new payoff feedback and of two choice rule assumptions for mapping experience onto choices. The combination of these assumptions produced four classes of models that were systematically compared. Two methods were employed to evaluate the success of learning models for approximating players' choices: One was based on estimating parameters from each person's data to maximize the prediction of choices one step ahead, conditioned by the observed past history of feedback. The second was based on making a priori predictions for the entire sequence of choices using parameters estimated from a separate experiment. The results indicated the advantage of a class of models incorporating decay of previous experience, whereas the ranking of choice rules depended on the evaluation method used.
Recently, interest has been rising in learning models that are applied to choices from repeated-play games. Recent studies of choice behavior in individual (see, e.g., Busemeyer & Myung, 1992; Erev & Barron, in press; Sarin & Vahid, 1999) and multiplayer (see, e.g., Camerer & Ho, 1999b; Cheung & Friedman, 1997; Erev & Rapoport, 1998; Erev & Roth, 1998; Fudenberg & Levine, 1995; Sarin & Vahid, 2001 ; Stahl, 1996) games have shown that learning in repeated-choice problems can be summarized by using surprisingly simple mathematical models. The purpose of this article is to provide a systematic comparison of the basic assumptions used to construct models for decision making based on experience from repeated play.
We evaluated four classes of models that were formed by combining two basic assumptions about learning rules with two basic assumptions about choice rules (see Table 1). The learning rules in these models differed according to the manner in which past experience was updated on the basis of new feedback: In one class, called the interference models, only the chosen option was updated, and unchosen options remained unchanged; in the other, called the decay models, the chosen option was updated and the unchosen options were discounted by some amount. The choice rules differed according to the manner in which past experience was mapped onto choice behavior: In one class, the option producing the maximum expectation was always chosen (with some guessing allowed); in the other, choices were probabilistically determined by the strength of expectation.
Two methods were used to compare the empirical validity of the models. The first was based on "one-stepahead" predictions, and the second on simulation of the entire game. Under the first method, the model predicted the player's next choice ahead using the past history of payoffs actually experienced by a player. In this case, the model parameters were estimated separately for each player to maximize the likelihood of the observed choices. Using the second method, model simulations were generated to predict the proportion of choices, averaged across players, for the entire length of the game. In this case, parameters estimated from one experiment were used to generate a priori predictions for a second experiment. The advantage of the first method was that it allowed tests of the model at the individual level, but it also had two disadvantages: (1) It had to rely on actual past choices of an individual, and (2) it relied on fitting parameters to the data. The second method circumvented both of these disadvantages, but it could not be used to test models at the individual level. By using both methods, we hoped to achieve a convergence of evidence.
A four-alternative choice task was chosen for model comparison rather than a simple two-alternative (binary) task because it allowed us to explore the main difference between interference and decay models. In interference models, only experience related to the selected alternative is updated following a choice. Thus, whereas in a binary task an update in the experience of the chosen alternative implies a mirroring update in the unchosen alternative, this is not so in a multiple-alternative task.
The particular task we selected was a highly studied problem called the Iowa gambling task (Bechara, A. R. Damasio, H. Damasio, & Anderson, 1994). This gambling task has been extensively studied in its association with decision-making deficits of individuals with brain damage (for a review, see Busemeyer, Stout, & Finn, in press; see also Bechara & H. Damasio, 2002; Clark & Robbins, 2002).
The present article begins by presenting the Iowa gambling task and summarizing the basic empirical results. second, the models to be compared are described. Rather than contrasting specific models, we will compare basic assumptions made by these models about the updating of expectancies and examine different choice rules. Third, different methods for evaluating learning models are described-one method is based on predicting the choices of individuals in a one-step manner, the other on simulating the choice probabilities of the group average for the entire sequence of choice trials. Finally, the models are empirically evaluated on the basis of data (Yechiam, Stout, Busemeyer, Rock, & Finn, 2005) collected under different payoff conditions of the Iowa gambling task.
EXPERIMENT AND BASIC FINDINGS
In the Iowa gambling task, participants are presented with four card decks, labeled A, B, C, and D. They are told to accumulate as much (real) money as possible by picking cards from the decks. Initially, they know nothing about the payoffs produced by each deck, and this has to be learned from trial-by-trial feedback. Decks differ with respect to the payoff for each card selection and the frequency and severity of penalties, as indicated in Table 2.
Note that Decks C and D may appear to be disadvantageous when considering the gain domain. However, in each trial the winnings are also paired with losses on many cards in such a way that Decks A and B are disadvantageous overall, leading to an average expected loss of 25 cents per trial, whereas Decks C and D are advantageous overall, leading to an average expected gain of 25 cents per trial.
Yechiam et al. (2005) examined the choices of young adults, mostly (85%) college students, in the Iowa gambling task.1 Two primary conditions were manipulated in their study: In the first, the standard "partialinformation" version of the task was used; in this version, only the payoff for the chosen gamble was shown on each trial. The second condition featured a modified "fullinformation" version of the task; in this version, the payoffs from all four gambles were shown on each trial (although earnings were based solely on the chosen option).
Yechiam et al. (2005) also performed a secondary manipulation involving the size of the payoffs. In the lowpayoff condition, disadvantageous decks (A and B) had the exact wins/losses indicated in Table 2. In the highpayoff condition, these payoffs (wins and losses) were multiplied by a constant factor of 1.5. This manipulation had little effect, and results from the two versions were pooled in the study. In the present analysis, the two payoff magnitude conditions were utilized to provide a cross-validation test of the models (as described in the Simulation Method section below).
Participants. The original study included 162 young men and women. The participants were 22 years old on average and had 14 years of education. Because of a technical problem, the trial-by-trial data of 7 individuals were lost, permitting the modeling of the behavior of 76 participants in the partial-information condition and 79 participants in the full-information condition.
Apparatus. The experiment used a computer-simulated version of the gambling task developed by Becharaet al. (1994). Images of four card decks labeled A, B, C, and D were displayed horizontally and "face down" on a monitor controlled by a desktop computer. The participants were instructed to make a series of selections from the decks using the mouse and to try to win as much money as possible. They received a $20.00 credit at the start of the task and were informed that their winnings would be paid at the end of the session as long as they continued until the game was completed (150 trials). Gains and losses were shown on two tally bars at the top of the display, the top one revealing the cumulative net win/loss and the bar below it indicating the win/loss for the most recent selection (see Figure 1).
The results in the partial-information condition (see Figure 2) show a slow but significant effect of learning, with fewer choices from disadvantageous decks (A and B) and more from advantageous decks (C and D) as a function of time (first versus last block of 25 trials).2 Note that these results deviate from Bechara et al.'s (1994) original results, in which control participants (who had lesions in the left somatosensory area and were about 20 years older than our participants on average) learned to choose Decks C and D consistently over Decks A and B. However, note that the present task did not include constraints on the order of losses as Bechara et al.'s did, and thus it might have been more difficult in our study to learn the contingencies. Also, the characteristics of the populations were different.
In contrast to most follow-ups of Bechara et al. (1994), Yechiam et al. (2005) focused on choices of specific decks (rather than grouping Decks C and D together as "advantageous"). The results revealed that for the partial-information condition, Decks B and D were chosen, on average, more often than Decks A and C [.62 vs. .38; t(75) = 6.12, p < .01]. The low-frequency decks (B and D) with respect to negative payoffs (p = .1 for a negative payoff) were chosen more often on average, despite having expected values equal to those of the high-frequency decks (A and C, respectively). This result was observed for both the advantageous and disadvantageous decks (B chosen more than A, D chosen more than C); it replicates previous findings in simpler binary tasks (Barron & Erev, 2003; Erev & Barron, in press; Hertwig, Barron, Weber, & Erev, 2004) and signal detection tasks (Barkan, Zohar, & Erev, 1998), showing that decision makers underweight small-probability events with experience.
A comparison of the partial- and full-information conditions (see Figure 2) reveals the following important differences: In the full-information condition, on average, more choices were made from decks with a low frequency of negative payoffs [B and D, .70 with full information vs. .62 with partial information; 7(153) = 3.07, p < .01] and fewer choices from advantageous Deck C [.15 vs. .24; t(153) = 3.95, p < .01]. Furthermore, for Deck C, choices decreased over time (although the decrease was largely limited to the first 50 trials). This finding is interesting, since it indicates that more information led to poorer performance in the task (and a $2 decrease in earnings from $23 to $21). It also replicates previous results indicating that forgone payoffs can, under certain conditions, increase risk seeking (see, e.g., Yechiam & Busemeyer, 2005). Thus, the results replicate some of the robust regularities in the performing of experience-based choice tasks. The next step was to systematically compare mathematical learning models that can capture these regularities.
EXPERIENCE-BASED DECISION-MAKING MODELS
An examination of the learning models used by previous researchers reveals that most models employ three groups of assumptions: First, a utility function is used to represent the evaluation of the payoff experienced immediately after each choice. second, a learning rule is used to form an expectancy (or propensity) for each choice alternative that summarizes the experience of all of the past utilities produced by each choice alternative. Third, a choice rule is based on the comparison of the expectancies formed for each choice option. Various learning theories posit one set of assumptions about the process for updating expectancies and another set about the choice rule. We first present the formulation of the models for the partial-information condition, in which the decision maker only experiences the payoff for the chosen alternative. Later, we will extend the formulation to the full-information condition, to include forgone-payoff information.
Two general classes of models have been proposed to account for how new information is accumulated in a learning task. Under one class of models, the weight of the old expectancy from an alternative changes only if new information is added about that alternative. Thus, the old expectancy from an alternative is discounted only if new information about outcomes from that alternative is presented. This class of models can be labeled interference models, because memory is only changed by relevant events and not simply as a function of time (see, e.g., Newell, 1992; Oberauer & Kliegl, 2001). Examples include the delta learning model (see, e.g., Busemeyer & Myung, 1992; Sarin & Vahid, 1999) and Bayesian learning (see, e.g., J. R. Anderson & Matessa, 1992).
In a second class of models, the weight of the old expectancy of an alternative can decrease on each choice trial even if no new information about a particular alternative is presented. Thus, expectations about one option can change as a result of selections of other alternatives. This class can be labeled decay models, because decay of memory occurs purely as a function of time, even without the occurrence of interfering events (see, e.g., Atkinson & Shiffrin, 1968; Broadbent, 1958). Examples include the reinforcement learning model used by Erev and Roth (1998) and the EWA model (Camerer & Ho, 1999a, 1999b). The different learning models are reviewed next.
We will initially present the specific assumptions of the models for the partial-information condition. Recall that in this condition, the decision maker is only given feedback about the payoff for the chosen alternative on each trial.
The second method of model evaluation did not use any information about the actual choices made by any of the participants. In other words, this method was based on evaluation of the predictions of each model (Busemeyer & Wang, 2000), which was done as follows.
Recall that our experiment included two different payoff conditions. We used parameters estimated from the first payoff condition to generate new predictions for the second payoff condition, and then these predictions were evaluated using the independent data from the second payoff condition. Thus, the parameters used to generate the predictions for the second payoff condition were based on the parameter estimates obtained in the first condition (see Rieskamp, Busemeyer, & Laine, 2003). This procedure is relatively parsimonious, since it is based on no further parameter estimation. However, because the two payoff conditions were relatively similar, the procedure cannot be considered a true test of generalization, but rather a cross-validation of the robustness of the set of parameters to changes in the evaluation method and in the payoff magnitude.
The parameters of the model were estimated in the low-payoff condition for one-step-ahead predictions (as detailed in the previous section). Then, the same parameters were used to generate the full simulation path in the high-payoff condition. A total of 1,000 simulations were generated to produce a distribution of choice sequences from a given model in the high-payoff condition, and these results were averaged to produce the probability of choosing each deck on each trial. We then examined the mean square deviation (MSD) of the model's predicted probability as compared with the observed proportion of choices on each trial, averaged across participants.
EVALUATION USING THE PREDICTION METHOD
For examining the (standard) partial-information condition, eight alternatives for the updating of propensities were compared. Five of these alternatives were essentially interference models: the delta rule, the original and modified Bayesian rule, and a reinforcement rule assuming a decreasing learning rate (as a function of either time or number of choices from a deck). The other alternatives were decay models, including the decay reinforcement model and the two instantiations of the EWA model. Three choice rules were simultaneously evaluated: a deterministic maximization rule, with or without a decrease in guessing, and a probabilistic ratio-of-strengths rule. These models were evaluated in an 8 (learning) × 3 (choice) table of models, which allowed decomposing the unique contribution of each component of a model. A BIC score was obtained for each participant and each model.
Table 3 presents the median BIC scores of models compared using the prediction method, pooled across participants. The results show that it is possible to distinguish the competing models empirically using the BIC index. Most importantly, the models that had the highest BIC scores were all in the decay category, which included the decay reinforcement and the EWA models. The decay reinforcement model produced the highest BIC scores across all assumptions about the choice rule.
An examination of different choice rules shows that, overall, models including the ratio choice rule produced higher BIC scores than did models including either of the two max rules. This result was consistent for all learning models. Thus, the learning model with the highest BIC score (22.17) was the reinforcement learning model that featured decay and ratio choice, although the two instantiations of the EWA model had high fit scores as well.
In addition to examining group medians, we assessed between-participants heterogeneity. Table 4 summarizes the percentages of participants whose average choice proportions were approximated best (had the highest BICs) under the different models. As in the previous comparison, the three decay models outperformed the five interference models. The single model that had the highest BIC for the largest proportion of individual participants was the decay reinforcement model with a ratio choice rule (31.6%). Across all choice rules, the two instances of the EWA model put together had the highest BIC for 29.3% of the participants. These results also show that, interestingly, some of the models that produce a low average BIC for all participants still approximate best the behavior of a small proportion of the players. Notably, the delta learning model under the ratio rule captured best the behavior of about 7% of the participants, as did also the modified Bayesian learning model. The clear bottom line, however, is that a decay model with the ratio-of-strengths choice rule produces better approximations of the next choice ahead.
To examine the effect of individual differences in decay and interference updating, we conducted post hoc Spearman correlation tests between the fit of the model (expressed by the BIC score) and choice proportions from specific decks. The only significant result was a negative correlation between the fit of the delta model and choice from disadvantageous Deck B, the deck with a 10% chance for large losses [r(16) = -.28, p < .05]. Thus, it appears that the delta model was less successful in predicting the behavior of those who had many choices from Deck B. The decay models were impartial to this behavior, because the preference for Deck B is easily captured by a decay formula with an extreme recency parameter.8
For the full-information condition, three learning models were selected for further testing. The delta learning model was selected as the one with the highest BIC from the interference models. The decay reinforcement and the EWA (with ρ = φ) models were selected for their high BIC scores in the decay group. These learning models were examined under two choice rules: the probabilistic ratio rule and the maximization with constant guessing rule. In addition, three basic assumptions regarding full information were compared: (1) Decision makers may simply ignore the forgone payoff information. In this case, the difference between decay and interference models is exactly the same as in the partial-information condition. Under this assumption, expectancies are updated with γ = 0-that is, weight is given only to feedback from the chosen deck. (2) Decision makers may give equal weight to actual and forgone payoffs, so that γ = 1 and equal weight is given to payoffs observed for all four decks. In this case, there is no mathematical difference between the delta and decay reinforcement models.9 (3) Decision makers may give more weight to actual payoffs, but still give smaller weight to forgone payoffs. In this last case, γ is a free parameter, and the difference between interference and decay models depends on the weight assigned by the model to unchosen alternatives.
Table 5 presents the BIC scores of the models. The results indicate that as in the partial-information condition, the BIC scores differentiate between competing models. First, as in the partial-information results, the BIC scores under the ratio choice rule were relatively higher than under the maximization rule (the latter scores were all negative). Second, under the ratio rule, the most flexible of the assumptions, assigning more weight to the chosen deck produced the highest BIC scores for all three learning models. Finally, again as in the partial-information condition, the decay reinforcement model produced the highest BIC score (7.76).
Note that using both choice rules, the difference between the delta model's BIC and the two other models' is largest under the assumption of giving weight only to chosen decks (with the ratio rule, a difference of 9.8 from the best BIC; with the max rule, a difference of 10.5). Under the assumption of equal weight to both payoffs, in which case all decks are updated regardless of choice, there is mathematically no difference between the delta and the decay reinforcement models, and both are slightly better than the EWA model with ρ = φ. Under the assumption of differential weighting among decks, the difference between the delta model and the best BIC increases again (with ratio, 9.9; with max, 4.5). Thus, it appears that the poorer performance of the delta model for predictions of one step ahead is largely due to the facts that (1) models that assign low weight to unchosen decks are more successful and (2) the delta model assumes that expectancies of unchosen decks are updated at a different rate than those of chosen decks.
EVALUATION USING THE SIMULATION METHOD
The simulation focused on the combination of the best three learning rules from the interference and decay classes and the two choice rules examined above. To recall, the simulation was an examination of the models using the parameters estimated in the previous section with the prediction method. Table 6 summarizes the parameters of each model. These parameters were computed by averaging across the individual estimates obtained using the prediction method for participants in the low-payoff condition.
The parameters were used to generate the simulations for the high-payoff condition. Note that a priori predictions were generated for the high-payoff condition because the parameters were based on the estimates obtained from the low-payoff condition. Table 7 (top) presents the MSDs of the different models for the high-payoff condition. Note that the MSDs are based on percentage scores rather than proportions.
The results indicate that the simulation method shows smaller differences between models than does the prediction method. In terms of updating propensities, the best model for both payoff conditions was the EWA model with ρ = φ, but the delta model with the maximization choice rule had slightly better fit than did the decay reinforcement model. As for choice rules, it appears that the large advantage of the ratio choice rule over the maximization rule that appeared for the prediction method diminished in the simulation. This result is likely due to the fact that, averaged over many individuals, sensitivity in the simulation remained constant in time. Thus, the assumption of changing sensitivity over time embedded in the ratio choice rule was unnecessary.
To understand these results, we plotted in Figure 3 the path estimated by the decay reinforcement and the delta models with either the ratio rule or the maximization rule. The trial-by-trial predictions were smoothed by using a moving average filter of seven trials. The results show that the delta model with the ratio rule predicts fewer choices made from Deck B, the deck with the infrequent but high losses. In fact, under any combination of parameters (that we examined) the delta plus ratio model cannot reproduce the strong preference for Deck B. Under the delta plus ratio rule, B cannot be preferred over both D, which produces smaller gains than B, and A, which produces smaller losses. Guessing and decay both lead to a recovery of the expectancies of B, because most of the time this deck produces results that have a high expected value.
Under the maximization rule, the fit of the delta model improves, but it predicts no learning. The maximization rule likewise predicts no learning with the decay reinforcement model. For example, for the decay reinforcement model with the ratio choice rule, the average predicted change in a participant's choices (between the first block of 25 trials and the last) is 8.9%. Under the maximization choice rule, the change is much smaller (only 2% on average). Thus, the maximization choice rule produces a prediction that is less changeable in time, which fits the relatively flat average choice path.
As in the partial-information condition, the parameters obtained with the prediction method were used to generate simulations for the high-payoff condition with full information. Table 8 summarizes the estimated parameters (using the prediction method) for the low-payoff condition. Table 9 (top) presents the MSDs of the different models for the high-payoff condition. Only the simulations for the unequal-weight model, which produced the highest BICs, are shown.
The results show that in the full-information condition, both decay models (decay reinforcement and EWA) were advantageous over the delta model. In addition, they show (as in the partial-information condition) that the clear difference between choice rules that was observed using the prediction method disappeared. The maximization choice rule led to better fits than did the ratio rule for the delta model, and the two choice rules produced roughly similar fits for the EWA model.
Figure 4 presents the paths estimated in the full-information condition for the decay reinforcement and delta models with either the ratio or the maximization rule. The figure indicates that one likely reason for the advantage of decay over interference models is the ability of the decay models to predict the increase in choices from disadvantageous Deck B in the full-information condition as compared with the partial-information condition. It appears that in the full-path simulation, the delta model cannot predict the extent to which this deck (with its low-frequency but high negative payoff) is preferred. The MSD for this particular deck for the delta model with maximization is 0.11, in comparison with 0.07 in the corresponding decay reinforcement mode (a 37% difference). For the other decks, the MSD of the delta model is better (A, 0.04; C, 0.09; D is not independent) and more similar to the decay reinforcement model (A, 0.06; C, 0.08).
In summary, although an evaluation of the learning models in the partial-information condition using the simulation method yielded less conclusive results, in the full-information condition decay models were clearly advantageous. The differences in the degree of fit between interference and decay models can largely be attributed to the fact that decay models can more easily capture the preference for the deck that featured large losses occurring 10% of the time.
Verification Using an Alternative Simulation Method
One potential limitation of the results using the simulation method is that they rely on the generalization of parameters estimated for one-step-ahead predictions to simulations of the entire average choice path. An alternative view (Haruvy & Erev, 2002) argues that such generalization is not always possible.
Supporting the latter view is the present finding that using the prediction method under all models, the attention to gains parameter W was on average higher than the attention to losses parameter L, indicating that gains loom greater in the mind than do losses (see Tables 6 and 8). This finding appears to be inconsistent with Barron and Erev's (2003) simulation results, and with other robust findings obtained using the simulation method, showing that people are more sensitive to losses than to gains. An alternative explanation is that, in the gambling task, losses occur frequently and are therefore less salient.
To guard against the possibility that the parameters assessed by the prediction method do not generalize to the simulation method (see Haruvy & Erev, 2002), we reestimated the parameters by fitting the simulations to the entire average choice path in the low-payoff condition. An examination of the two best-fitting models in this simulation (delta plus maximization and decay reinforcement with the ratio rule) shows that for both, the losses parameter was higher than the gains parameter in the partialand full-information conditions (for conciseness, the full results are not detailed). That is, in this simulation, losses loomed greater than gains. However, the simulation with estimated parameters replicated the small advantage of decay models in the partial- and full-information conditions (for MSDs, see the bottom sections of Tables 7 and 9). Furthermore, the results with this method distinctly show that the maximization with guessing rule was as adequate as the ratio rule in the partial-information condition and that it improved fits over the ratio rule in the full-information condition.
The present study highlights the importance of examining the different assumptions that underlie learning models about the components of the learning and choice processes. Regarding the learning process, an important difference emerged between decay and interference models. The results of the analysis using the prediction method showed that, in the standard Iowa gambling task, decay models were superior to interference models under robust assumptions pertaining to the choice rule. In the simulation, the advantage of decay models was less distinct for the partial-information condition, but it was clear nonetheless that decay models were not outperformed, indicating that their advantage in predictions of one step ahead is not an artifact due to post hoc model fitting of parameters using information about past choices.
The advantage of decay models appeared more strongly in the full-information condition. In this condition, the advantage of decay over interference models was highest when the model ignored forgone payoffs. In this case, interference models update only the selected alternative, and they are therefore most distinct from decay models. Models assuming partial weighting of forgone payoffs take into account the experience of unchosen alternatives. Hence, under such partial weighting the similarity of the interference and decay models increases, because interference models update expectancies for unselected decks as well; the value of the interference models also appears to increase in this case. Yet, the fact that interference models do not fully update the expectancies of unchosen alternatives still appears to lead to poorer predictions. Most people's behavior is best described by a model that discounts past expectancies of alternatives, regardless of whether they were selected.
Previously, it has been suggested that in repeatedchoice tasks decision makers are extremely sensitive to the value of the last three or four choices made (Hertwig et al., 2004). This is considered to be one of the factors that leads to underweighting small probabilities in repeated choices (Barron & Erev, 2003; Hertwig et al., 2004), as opposed to overweighting them in single choices (Kahneman & Tversky, 1979). In the present context, underweighting was explained on the basis of the tendency to discount large losses that occur infrequently. Indeed, there was a positive correlation between the amount of decay in the decay reinforcement model (expressed by the recency parameter) and choices from disadvantageous Deck B, which produced rare but large losses.
One plausible interpretation for the tendency to decay expectancies of unchosen decks is that decay in expectancy is a motivational phenomenon similar to the recency effect. The advantage of the decay model implies that players are sensitive to what has happened in the last couple of trials and discount outcomes from the more distant past. This tendency, coupled with a nondeterministic choice rule, seems to be important for adapting to a rapidly changing choice environment in which the most recent trials provide a good sample of the present state of the environment (Gonzalez, Lerch, & Lebiere, 2003). Extreme decay fares less well (in terms of performance level) in a static environment with rare negative payoffs. Such rare negative payoffs can be discounted more easily, which allows the expectancy of the alternatives that produced them to recover.
An alternative explanation is that the tendency of past expectancies to decay is a result of working memory limitations. That is, keeping track of unselected decks requires mental effort, especially in the full-information condition. This can reduce the attractiveness (i.e., the expectancies) of unselected decks as a function of the lag in their selection (see C. J. Anderson, 2003), as is formalized in the decay model. Future studies will be necessary to examine whether the decay in expectancy is a purely motivational phenomenon or is affected by memory constraints.
In our study, we examined the advantage of decay models using a choice task in which not updating unchosen alternatives was expected to have significant outcomes. In simpler binary choice tasks (which are more commonly examined in studies of individual decisionmaking tasks), the difference between the two classes of models is assumed to be relatively small, because both models change the ratio of the expectancies between the two alternatives. In the Iowa gambling task, where there are four alternatives, this difference appears to create significant disparities between models that update and those that do not update the expectancy of unchosen alternatives.
In contrast to the converging evidence pertaining to the updating of expectancies, the results of our two evaluation methods, prediction and simulation, showed less similarity regarding choice rules. Using the predictionof-one-step-ahead method, the ratio-of-strengths rule was advantageous under all assumptions pertaining to the updating of expectancies. Using the simulation method, the differences in results between the choice rules were smaller, and in some cases (most notably, in the fullinformation condition) the maximization rule with constant guessing actually outperformed the ratio rule. We argue that this difference derives from the value of the ratio rule for capturing individual differences in sensitivity changes over time. The ability of the ratio rule to tap varying changes in sensitivity in different performers leads to improved fit in the prediction of individual performers' choices. However, when aggregated over different performers, the change in sensitivity is marginal, and modeling it does not improve predictions.
A Note on Methodology
The present study employed two methods for evaluating the accuracy of the assumptions of learning models. The first was to use the prediction of the next step ahead made by individual decision makers. Under this method, an advantage of decay models was due to their allowance for a steeper reduction in the weight of old expectancies as a function of time. The predictions of decay models may therefore mimic a regression model based on the player's previous choices. This implies that the advantage to decay models observed in the prediction method could be specific to the method of predicting the next choice ahead based on the player's past experiences.
For this reason, we extended our investigation in two ways. First of all, we examined an agent-based simulation of an entire game path that predicted the aggregated choice pattern. This simulation method did not have input from previous choices made by a player, but rather, only received prior choice of the model itself (i.e., the agent). Second, we examined the generality of the simulation for a data set in which the parameters were not estimated. This was done by using the parameters estimated in one payoff condition to simulate the behavior observed in another payoff condition.
Some researchers have suggested that the value of different models may be highly specific to the precise game, its parameters, and the evaluation method (Feltovich, 2000; Haruvy & Erev, 2002; Salmon, 2001). For example, it has been argued that model evaluation based on the examination of group averages can lead to different results than the examination of individuals' choices. Likewise, an evaluation based on the prediction of the next choice ahead may lead to different results than another evaluation based on the prediction of many choices ahead (see Rapoport, Daniel, & Seale, 1998; Stahl, 1996).
In the present study, differences between the rankings of basic assumptions under different evaluation techniques did appear in the ranking of choice rules. The ratio-of-strengths rule was clearly advantageous when using the prediction method, but it fared less well in a simulation of the entire average choice path. However, for another basic assumption, the updating of expectancies, there were more similarities than differences under the distinct evaluation methods.
Thus, the present study goes one step farther than previous studies have, by showing an implicit embedded association between basic assumptions and the evaluation method. It is the belief of the authors that we were enabled to observe this interaction by insisting on two methodological constraints: First, our predictions were driven at the individual level of analysis (see Hertwig & Todd, 2000). second, we simultaneously compared different assumptions of learning models rather than whole modeling approaches. For example, consider a pair of modeling approaches: One has better predictions regarding the updating of expectancies, whereas the other has better predictions regarding the choice rule. A global comparison of the predictions of the two approaches (the common method in experimental economics) would, in this case, miss the value of particular assumptions in each model that lead to better predictions. Under the present approach, the different components were evaluated simultaneously by examining all of the possible combinations of assumptions, and an examination of different data sets prevented potential overfitting due to multiple comparisons. The present approach therefore provides an attempt to examine learning models in a way that would ensure the accumulation of knowledge regarding the value of basic assumptions for predicting behavior.
(Manuscript received June 2, 2004; revision accepted for publication November 10, 2004.)
1. The study examined the relationship between drug abuse and performance in the Iowa gambling task.
2. Choices of Deck A decreased from an average proportion of .!8 to .13 [t(75) = 2.62, p < .05]. Choices of B decreased from an average of .36 to .28 [f(75) = 2.85, p < .01]. Choices of advantageous Deck C increased from an average of .23 to .29 [t(75) = 2.11, p < .05]. Choices of D likewise increased from an average of .24 to .31 [t(75) = 2.55, p < .05].
3. Equation 5 was further developed as follows: E^sub j^(t) = E^sub j^(t - 1) + φ . δ^sub j^(t) . [u(t) - E^sub j^(t - 1) = E^sub j^(E - 1) + φ . δ^sub j^(t) = E^sub j^(t - 1) + L . loss(t) - E^sub j^(t - 1)] = E^sub j^(t - 1) + δ^sub j^(t) . [W . φ . win(t) L . φ . loss(t) - φ . E^sub j^(t - 1)] = E^sub j^(t - 1) + δ^sub j^(t) . [b^sub 1^ . win(t) b^sub 2^ . loss(t) - φ . E^sub j^(t - 1)]. This last step was taken to make parameter φ independent of the loss and gain parameters, as it is in the decay models.
5. An alternative reinforcement learning model defines α^sub jt^ = 1 and β^sub jt^ = δ^sub j^(t)/(1 + t^sup φ^), where φ is a free parameter. In this model, the weight is given to the current payoff decreases as a function of time rather than of number of choices.
6. We also examined a decrease in guessing as a function of the number of choices of a deck, but since this model did not improve predictions, for the sake of conciseness we do not report these results.
7. Camerer and Ho (1999b) suggested that in the EWA model, values of φ that are greater than 1 require a modified learning rule. We therefore constrained the value of the φ parameter in both versions of that model to be between 0 and 1. Thus, for the sake of parsimony, our models capture only specific cases of the more genera] EWA model.
8. This conclusion is further supported by correlations between the recency parameter and choices from the different decks under the ratio rule. There was a positive correlation between the recency parameter of the decay reinforcement model and choices from disadvantageous Deck B, which produces rare but large losses [r(56) = .28, p < .05; only for 0 < φ < 1; see Yechiam, Veinott, Busemeyer, & Stout, in press]. In contrast, there was no such association for the delta model [r-(42) = .06, n.s.].
9. In this case, the difference between the EWA model and the other two is a result of the inclusion of the C, factor in the EWA model. Note also that, although the delta model is identical mathematically to the decay reinforcement model, small differences may emerge as a result of parameter constraints.
ANDERSON, C. J. (2003). The psychology of doing nothing: Forms of decision avoidance result from reason and emotion. Psychological Bulletin, 129, 139-167.
ANDERSON, J. R., & MATESSA, M. (1992). Explorations of an incremental, Bayesian algorithm for categorization. Machine Learning, 9, 275-308.
ATKINSON, R. C., & SHIFFRIN, R. M. (1968). Human memory: A proposed system and its control processes. In K.. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory (Vol. 2, pp. 89-195). New York: Academic Press.
BARKAN, R., ZOHAR, D., & EREV, I. (1998). Accidents and decision making under uncertainty: A comparison of four models. Organizational Behavior & Human Decision Processes, 74, 118-144.
BARRON, G., & EREV, I. (2003). Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making, 16, 215-233.
BECHARA, A., DAMASIO, A. R., DAMASIO, H., & ANDERSON, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50, 7-15.
BECHARA, A., & DAMASIO, H. (2002). Decision-making and addiction (part I): Impaired activation of somatic states in substance dependent individuals when pondering decisions with negative future consequences. Neuropsychologia, 40, 1675-1689.
BROADBENT, D. E. (1958). Perception and communication. London: Pergamon.
BROWN, G. W. (1951). Iterative solution of games by fictitious play. In T. C. Koopmans (Ed.), Activity analysis of production and allocation (pp. 374-376). New York: Wiley.
BUSEMEYER, J. R., & MYUNG, I. J. (1992). An adaptive approach to human decision-making: Learning theory, decision theory, and human performance. Journal of Experimental Psychology: General, 121, 177-194.
BUSEMEYER, J. R., STOUT, J. C., & FINN, P. R. (in press). Using computational models to help explain decision making: Processes of substance abusers. In D. Barch (Ed.), Cognitive and affective neuroscience of psychopathology. New York: Oxford University Press.
BUSEMEYER, J. R., & WANG, Y.-M. (2000). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44, 171-189.
BUSH, R. R., & MOSTELLER, F. (1955). Stochastic models for learning. New York: Wiley.
CAMERER, C., & Ho, T.-H. (1999a). EWA learning in games: Preliminary estimates from weak-link games. In D. V Budescu, I. Erev, & R. Zwick (Eds.), Games and human behavior: Essays in honor of Amnon Rapoport (pp. 31-52). Mahwah, NJ: Erlbaum.
CAMERER, C., & Ho, T.-H. (1999b). Experience-weighted attraction learning in normal form games. Econometrica, 67, 827-874.
CHEUNG, Y.-W..& FRIEDMAN, D. (1997). Individual learning in normal form games: Some laboratory results. Guinea & Economic Behavior, 19,46-76.
CLARK, L., & ROBBINS, W. (2002). Decision-making deficit in drug addiction. Trends in Cognitive Psychology, 6, 361-362.
COURNOT, A. (1960). Researches into the mathematical principles of the theorv of wealth (N. Bacon, Trans.). London: Haffner.
EREV, I., & BARRON, G. (in press). On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychological Review.
EREV, I., & RAPOPORT, A. (1998). Magic, reinforcement learning and coordination in a market entry game. Games & Economic Behavior, 23, 146-175.
EREV, I., & ROTH, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88, 848-881.
ESTES, W. K., & BURKE, C. J. (1953). A theory of stimulus variability in learning. Psychological Review, 6, 276-286.
FELTOVICH, N. (2000). Reinforcement-based vs. beliefs-based learning in experimental asymmetric-information games. Econometrica, 68, 605-641.
FUDENBERG, D., & LEVINE, D. K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics & Control, 19, 1065-1089.
FUDENBERG, D., & LEVINE, D. K. (1998). The theory of learning in games. Cambridge, MA: MIT Press.
GLUCK, M. A., & BOWER, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychology: General, 117, 227-247.
GONZALEZ, C., LERCH, J. F., & LEBIERE, C. (2003). Instance-based learning in dynamic decision making. Cognitive Science, 27, 591-635.
HARLESS, D. W., & CAMERER, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251-1289.
HARUVY, E., & EREV, I. (2002). On the application and interpretation of learning models. In R. Zwick & A. Rapoport (Eds.), Experimental business research (pp. 285-300). Boston: Kluwer.
HERTWIG, R., BARRON, G., WEBER, E. U., & EREV, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological Science, 15, 534-539.
HERTWIG, R., & TODD, P. M. (2000). Biases to the left, fallacies to the right: Stuck in the middle with null hypothesis significance testing: Commentary on Krueger on social bias. Psycoloquy, 11(28).
KAHNEMAN, D., & TVERSKY, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263-291.
LUCE, R. D. (1959): Individual choice behavior. New York: Wiley.
NELDER, J. A., & MEAD, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308-313.
NEWELL, A. (1992). Unified theories of cognition and the role of Soar. In J. A. Michon & A. Akyurek (Eds.), Soar: A cognitive architecture in perspective (pp. 25-75). Dordrecht: Kluwer.
OBERAUER, K., & KLIEGL, R. (2001). Beyond resources-Formal models for complexity effects and age differences in working memory. European Journal of Cognitive Psychology, 13, 187-215.
RAPOPORT, A., DANIEL, T. E., & SEALE, D. A. (1998). Reinforcement-based adaptive learning in asymmetric two-person bargaining with incomplete information. Journal of Experimental Economics, 1, 221-253.
RIESKAMP, J., BUSEMEYER, J. R., & LAINE, T. (2003). How do people learn to allocate resources? Comparing two learning theories. Journal of Experimental Psychology: Learning, Memory, & Cognition, 29, 1066-1081.
ROTH, A. E., & EREV, I. (1995). Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games & Economic Behavior, 8, 164-212.
RUMELHART, D. E., MCCLELLAND, J. L., & THE PDP RESEARCH GROUP (1987). Parallel distributed processing: Explorations in the microstructure of cognition (Vois. I & 2). Cambridge, MA: MIT Press.
SALMON, T. (2001). An evaluation of econometric models of adaptive learning. Econometrica, 69, 1597-1628.
SARIN, R., & VAHID, F. (1999). Payoff assessments without probabilities: A simple dynamic model of choice. Games & Economic Behavior, 28, 294-309.
SARIN, R., & VAHID, F. (2001). Predicting how people play games: A simple dynamic model of choice, (lames & Economic Behavior, 34, 104-122.
SCHWARTZ, G. (1978). Estimating the dimension of a model. Annals of Statistics, 5, 461 -464.
STAHL, D. (1996). Boundedly rational rule learning in a guessing game. Games & Economic Behavior, 16, 303-330.
SUTTON, R. S., & BARTO, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
YECHIAM, E., & BUSEMEYER, J. R. (2005). The effect of forgone payoffs on underweighting small probability events. Manuscript submitted for publication.
YECHIAM, E., STOUT, J. C., BUSEMEYER, J. R., ROCK, S. L., & FINN, P. R. (2005). Individual differences in the response to forgone payoffs: An examination of high functioning drug abusers. Journal of Behavioral Decision Making, 18, 97-110.
YECHIAM, E., VEINOTT, E. S., BUSEMEYER, J. R., & STOUT, J. C. (in press). Cognitive models for evaluating basic decision processes in clinical populations. In R. Neufeld (Ed.), Advances in clinical cognitive science: Formal modeling and assessment of processes and symptoms. Washington, DC: American Psychological Association.
ELDAD YECHIAM and JEROME R. BUSEMEYER
Indiana University, Bloomington, Indiana
This research was supported in part by Grant DA R01 014119 from the National Institute on Drug Abuse and by Shared University Research Grants from IBM to Indiana University. The authors thank Richard Shiffrin for his comments. Correspondence relating to this article may be sent to E. Yechiam, Behavioral Science Area, Faculty of Industrial Engineering and Management, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel (e-mail: yeldad(gjtx.technion.ac.il).…