Academic journal article Kuram ve Uygulamada Egitim Bilimleri

Comparing Performances (Type I Error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel Methods in the Determination of Differential Item Functioning

Academic journal article Kuram ve Uygulamada Egitim Bilimleri

Comparing Performances (Type I Error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel Methods in the Determination of Differential Item Functioning

Article excerpt

Differential item functioning (DIF) is an essential step in gathering score validity evidence. It exists when examinees with the same ability level have different probabilities of success on a given item (Holland & Wainer, 1993). DIF can be evidence of item bias, and biased items decrease a test's validity and cause unfair scoring. In an educational context, results of DIF studies can be used to organize more valid, fairer measurements. There are too many statistical procedures to detect DIF (Narayanan & Swaminathan, 1994), but there are several techniques designed to determine whether a test item is functioning differentially. Some of these techniques are based on classical test theory (CTT) and others on item response theory (IRT). Mantel - Haenszel (MH) (Holland & Thayer, 1988), logistic regression (LR) (Swaminathan & Rogers, 1990), and SIBTEST (Shealy & Stout, 1993) are based on CTT. Examples for IRT based methods are Lord's chi-square test, Raj us area measures, and likelihood ratio. DIF is examined by comparing item response distribution for two different groups of examinees with equal ability levels. Examinees with the same knowledge must respond similarly to test questions, regardless of their group membership. Differences in distributions are interpreted as DIF (Steinberg & Thissen, 2006).

CTT based methods compare groups' score distributions, but in IRT methods, probabilities of responding correctly to the items are compared. IRT methods are based on models, and the comparison parameters are changed according to the models. For example: in 1PLM, groups are compared with respect to b- item difficulty parameter; in 2PLM, a-item discrimination and b parameters are used for comparison. Between groups, b parameter differences indicate uniform DIF; differences in a parameter indicate non-uniform DIF.

Performances of DIF detection methods are not the same. IRT based methods are theoretically powerful, but large samples are required. Practically, satisfying this condition is difficult (Narayanan & Swaminathan, 1994). DIF studies regarding methods' performances have found that several factors, e.g., test length, sample size, test group size, group mean difference, standard deviation difference, distribution of difference, and interaction of these factors can be affected (Ackerman & Evans, 1992; Finch, 2005; Finch & French, 2007; Kim, 2010; Narayanan & Swaminathan, 1994; Prieto, Barbero, & San Luis, 1997; Rogers & Swaminathan, 1993; Rous sos & Stout, 1996; Shealy & Stout, 1993).

In this research, Type I error rate and power of the MH procedure, IRT-LR, and SIBTEST methods are investigated based on sample sizes, ability differences between groups, test length, percentage of DIF, and the underlying model (2PL and 3PL). Below, these three methods are explained in detail.

DIF Detection Methods

Studies on DIF in Turkey have mostly used MH and LR methods (Bakan-Kalaycioglu ve Kelecioglu, 2011; Bekçi, 2007; Çepni, 2011; Karakaya, 2012; Karakaya & Kutlu, 2012). In the present study, MH, SIBTEST, and IRT-LR methods were used. The LR method was not used for two main reasons: (1) in some studies, the LR method gave the same results as the MH method (Ankenmann, Witt, & Dunbar, 1996; DeMars, 2009; Vaughn & Wang, 2010), and (2) the error rate of the LR method was very high, and its statistical power, lower (Dainis, 2008; Hidalgo & Lopez-Pina, 2004; Jodoin & Gierl, 2001; Li, Brooks, & Johanson, 2012). In addition to this, the main weakness of the LR method in DIF determination is a tendency to produce higher Type I error (Li & Stout, 1996; Narayanan & Swaminathan, 1996; Rogers & Swaminathan, 1993; Swaminathan & Rogers, 1990).

Mantel-Haenszel (MH): The MH procedure is a common method in DIF detection. This procedure was developed to detect uniform DIF, and it is based on chi-square statistics. Also, the MH procedure is based on estimating the probability of a member of the reference or focal group (Agresti, 1984). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.