Academic journal article International Journal of Management

Unweighted and Weighted Kappa as Measures of Agreement for Multiple Judges

Academic journal article International Journal of Management

Unweighted and Weighted Kappa as Measures of Agreement for Multiple Judges

Article excerpt

Unweighted and weighted kappa are widely used to measure the degree of agreement between two independent judges. Extension of unweighted and weighted kappa to three or more judges has traditionally involved measuring pairwise agreement among all possible pairs of judges. In this paper, unweighted and weighted kappa are defined for multiple judges and compared with pairwise kappa. Also, exact variance and resampling permutation procedures are described that yield approximate probability values.

(ProQuest: ... denotes formulae omitted.)

1: Introduction

The classification of objects into categories and ordered categories is common in business and management research. It is sometimes important to assess agreement among classifications for multiple judges. For example, it may be of interest to measure the agreement among a committee comprised of upper-management in the evaluation of possible promotions of managers to vice-president, or measure the agreement among a panel of judges rating Small Business Innovation Research (SBIR) proposals, or measure the agreement of managers assessing a group of interns for a possible permanent position.

Cohen (1960) introduced unweighted kappa, a chance-corrected index of interjudge agreement for categorical variables. Kappa is 1 when perfect agreement between two judges occurs, 0 when agreement is equal to that expected under independence, and negative when agreement is less than expected by chance (Fleiss et al., 2003, p. 434). Weighted kappa (Spitzer et al., 1 967 ; Cohen, 1 968) is widely used for ordered categorical data (Cicchetti, 1981; Kramer and Feinstein, 1981; Banerjee et al., 1999; Kingman, 2002; Ludbrook, 2002; Perkins and Becker, 2002; Fleiss et al., 2003, p. 608; Kundel and Polansky, 2003; Schuster, 2004; Berry et al., 2005). Whereas unweighted kappa does not distinguish among degrees of disagreement, weighted kappa incorporates the magnitude of each disagreement and provides partial credit for disagreements when agreement is not complete (Maclure and Willett, 1987). The usual approach is to assign weights to each disagreement pair with larger weights indicating greater disagreement.

While both unweighted and weighted kappa are conventionally used to measure the degree of agreement between two independent judges, the extension of unweighted and weighted kappa to three or more judges has been problematic. One popular approach has been to compute kappa coefficients for all pairs of judges, i.e., pairwise interobserver kappa (Fleiss, 1971 ; Light, 1971 ; Conger, 1980; Kramer and Feinstein, 1981 ; Schouten, 1980, 1982a, 1982b; Epstein et al., 1986; Herman et al., 1990; Taplin et al., 2000; Kundel and Polansky, 2003; Schorer and Weiss, 2007). Pairwise kappa is akin to averaging pairs of t tests instead of utilizing an F test, or averaging zero-order correlation coefficients instead of employing multiple correlation. The problem is exacerbated when trying to define appropriate probability values, as the pairwise probability values are not orthogonal. General procedures for combining probability values due to Fisher (1934, 1948) and Edgington (1972) require independence that is not satisfied with pairwise comparisons. In Section 2, a variety of pairwise unweighted kappas are described and compared, Section 3 describes a recently introduced unweighted kappa for multiple judges (Mielke et al., 2007a; 2008), Section 4 discusses a resampling permutation procedure to obtain the probability of an observed unweighted kappa, Section 5 contains an example analysis of unweighted kappa, Section 6 introduces a recently published weighted kappa for multiple judges (Mielke et al., 2007a; 2008), Section 7 describes exact variance and resampling permutation procedures to obtain the probability of an observed weighted kappa, and Section 8 provides an example analysis of weighted kappa with multiple judges. Although the unweighted and weighted kappas introduced in this paper are appropriate for any number of r > 2 disjoint ordered categories and m > 2 judges, the description of the procedure and the example are confined to m = 4 judges to simplify presentation. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.