Comparison of Basic Assumptions Embedded in Learning Models for Experience-Based Decision Making

Article excerpt

The present study examined basic assumptions embedded in learning models for predicting behavior in decisions based on experience. In such decisions, the probabilities and payoffs are initially unknown and are learned from repeated choice with payoff feedback. We examined combinations of two rules for updating past experience with new payoff feedback and of two choice rule assumptions for mapping experience onto choices. The combination of these assumptions produced four classes of models that were systematically compared. Two methods were employed to evaluate the success of learning models for approximating players' choices: One was based on estimating parameters from each person's data to maximize the prediction of choices one step ahead, conditioned by the observed past history of feedback. The second was based on making a priori predictions for the entire sequence of choices using parameters estimated from a separate experiment. The results indicated the advantage of a class of models incorporating decay of previous experience, whereas the ranking of choice rules depended on the evaluation method used.

Recently, interest has been rising in learning models that are applied to choices from repeated-play games. Recent studies of choice behavior in individual (see, e.g., Busemeyer & Myung, 1992; Erev & Barron, in press; Sarin & Vahid, 1999) and multiplayer (see, e.g., Camerer & Ho, 1999b; Cheung & Friedman, 1997; Erev & Rapoport, 1998; Erev & Roth, 1998; Fudenberg & Levine, 1995; Sarin & Vahid, 2001 ; Stahl, 1996) games have shown that learning in repeated-choice problems can be summarized by using surprisingly simple mathematical models. The purpose of this article is to provide a systematic comparison of the basic assumptions used to construct models for decision making based on experience from repeated play.

We evaluated four classes of models that were formed by combining two basic assumptions about learning rules with two basic assumptions about choice rules (see Table 1). The learning rules in these models differed according to the manner in which past experience was updated on the basis of new feedback: In one class, called the interference models, only the chosen option was updated, and unchosen options remained unchanged; in the other, called the decay models, the chosen option was updated and the unchosen options were discounted by some amount. The choice rules differed according to the manner in which past experience was mapped onto choice behavior: In one class, the option producing the maximum expectation was always chosen (with some guessing allowed); in the other, choices were probabilistically determined by the strength of expectation.

Two methods were used to compare the empirical validity of the models. The first was based on "one-stepahead" predictions, and the second on simulation of the entire game. Under the first method, the model predicted the player's next choice ahead using the past history of payoffs actually experienced by a player. In this case, the model parameters were estimated separately for each player to maximize the likelihood of the observed choices. Using the second method, model simulations were generated to predict the proportion of choices, averaged across players, for the entire length of the game. In this case, parameters estimated from one experiment were used to generate a priori predictions for a second experiment. The advantage of the first method was that it allowed tests of the model at the individual level, but it also had two disadvantages: (1) It had to rely on actual past choices of an individual, and (2) it relied on fitting parameters to the data. The second method circumvented both of these disadvantages, but it could not be used to test models at the individual level. By using both methods, we hoped to achieve a convergence of evidence.

A four-alternative choice task was chosen for model comparison rather than a simple two-alternative (binary) task because it allowed us to explore the main difference between interference and decay models. …