Academic journal article Social Behavior and Personality: an international journal

A Comparative Study of Two Standard-Setting Techniques

Academic journal article Social Behavior and Personality: an international journal

A Comparative Study of Two Standard-Setting Techniques

Article excerpt

This study compares the Angoff method (1971) for setting standards with the Ebel method (1972), using a test in introductory psychology. Although results showed a significant difference in the minimum passing score obtained through these methods, results also showed strong evidence of agreement between both methods about the number of students classified as pass/fail. The Angoff method tended to be more stringent and produce a higher minimum passing score than the Ebel method.

Keywords: Angoff method, Ebel method, standard setting, minimum passing score.

Over the past five decades, numerous procedures for establishing performance standards on competency tests have been introduced and refined. A minimum passing score (MPS) on a test represents an answer to the question, "How much is enough?" The MPS indicates the level of knowledge or skill that will be considered sufficient for a specific purpose (Livingston & Zieky, 1989).

Broadly speaking there are three approaches to standard setting. The first involves an inspection of content of a test by one or more expert judges who render a judgment based on a holistic impression of test content. The second approach is based on the performance of the examinee. The third is based on judgments of individual item content (Cracker & Algina, 1986).

These judgmental methods are based entirely on the judgments of one or more persons. Where several judges are involved, the decisions may be reached independently or from a panel discussion (Chang, Dziuban, Hynes, & Olson, 1996).

However, in real-life situations, judges are not always neutral and fair when they give their judgments or ratings about a particular attribute of some objects. On the one hand, some judges may give balanced ratings for objects in different groups; others may give ratings that have an impact (i.e., when the average ratings of objects belonging to a specific group are higher than the average ratings of objects belonging to another group), and others may function differentially, that is, they may give consistently higher/lower ratings to objects from one group, even if objects from the other group have the same standpoint on the scale of the concept being measured as those objects from the first group. On the other hand, some judges may tend to rate more generously than others. These different cases could be a problem in all judgmental standard-setting methods, because they may result in artificial inflation/shrinkage in the cutoff score.

Another problem which may result in a different MPS is the way judges think of the objects being rated when different methods are used. Different methods require judges to do different tasks and think of the same object in a different way each time. When judges differ in their conceptualizations of minimal competency, judgmental inconsistency may arise. Likewise, inconsistency may occur among judges when they are unable to maintain their conceptualizations of the minimal competency across items on the test (Chang et al., 1996).

If it is true that an MPS may significantly differ when determined through different methods (e.g., Angoff, 1971; Ebel, 1972), then numerous errors may occur when these scores are used to determine students' pass/fail status. In other words, the standard applied will affect which persons are classified as masters or not masters or which persons have achieved a certain level of competent performance within some domain. Similarly, in employment testing, these scores are often essential in important decisions regarding employee selection (Maurer, Alexander, Callahan, Bailey, & Dambort, 1991).

A number of studies have compared different procedures of setting standards. According to Berk (1986), more than 35 methods have been proposed for setting standards. Most of these methods were developed during the 1970s. There have also been more than 20 investigations comparing the various judgmental and empirical-judgmental methods during the early 1980s. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.