Academic journal article Economic Inquiry

The Harder the Task, the Higher the Score: Findings of a Difficulty Bias

Academic journal article Economic Inquiry

The Harder the Task, the Higher the Score: Findings of a Difficulty Bias

Article excerpt

Studies have found that going first or last in a sequential order contest leads to a biased outcome, commonly called order bias (or primacy and recency). Studies have also found that judges have a tendency to reward contestants they recognize with additional points, called reference bias. Controlling for known biases, we test for a new type of bias we refer to as "difficulty bias, " which reveals that athletes attempting more difficult routines receive higher execution scores, even when difficulty and execution are judged separately. Despite some identification challenges, we add to the literature by finding strong evidence of a difficulty bias in gymnastics. We also provide generalizations beyond athletics. (JEL L10, L83, D81, J70, Z1)


Judgments are made in many areas of life: job interviews, refereed journal articles, marketing pitches, oral and written exam grades, auditions, sporting events, debates, or even stock analyst estimates. In areas where judges determine the outcome of an event, bias in the judging process can create problems. Biased judging potentially leads to questions about efficiency and fairness, particularly if it results in selecting less than optimal candidates (Page and Page 2010).

Judging and perception biases have been observed in a variety of situations. Psychologists show that sequential presentation of information can influence the way the information is processed (Mussweiler 2003). This idea has been carried over to other fields including economics (Neilson 1998; Page and Page 2010; Sarafidis 2007) and marketing (Novemsky and Dhar 2005). Judging bias has been found in

orchestra auditions (Goldin and Rouse 2000) and sequential voting through the "Idol" series (Page and Page 2010). Bias has also been found in basketball referees (Price and Wolfers 2010).

We test for bias in the judging of elite gymnastics. In particular, the gymnastics meet we analyze provides a uniquely suitable dataset: the order of competition is randomly assigned to a given country, and the difficulty and execution of a routine are separately judged. (1) Following previous biases found in the literature, we control for performance order (primacy and recency) and reference bias. Despite some unit analysis challenges in our control for reference bias and identification issues concerning our lack of a perfect control for athlete ability, we add to the literature by finding strong evidence of difficulty bias; execution judges show a favorable bias for those athletes attempting more difficult routines.

Measuring difficulty bias requires data where judgment is delivered in two parts: difficulty and execution. This can be found in the world of elite level gymnastics. Elite gymnasts receive scores based on the difficulty of the task and

the execution of this task. One panel of judges is charged to evaluate the execution, and only the execution, of the routine, with an independent panel of judges evaluating the difficulty, and only the difficulty, of each routine. In other words, execution judges should not be concerned with the difficulty of the routine, and difficulty judges should not be influenced by the execution. Because the judges sit on separate panels, we can determine if the difficulty of the routine influences the execution score.

Using normalized data, mean zero and standard deviation of one, we regress execution score on difficulty score, with additional controls. We find that a participant's overall score is artificially inflated when that athlete attempts a more difficult routine. Figure 1 shows the extent of this bias. Increasing one's difficulty by one standard deviation artificially inflates the execution measure by 0.21 standard deviations.

Likewise, attempting a less difficult routine, one that is one standard deviation below the mean, decreases the execution score by 0.45 standard deviations.

This finding has major implications for the ability of judges to accurately rank individuals. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.