Academic journal article International Journal of Applied Educational Studies

Identifying Causes of English-Chinese Translation Differential Item Functioning

Academic journal article International Journal of Applied Educational Studies

Identifying Causes of English-Chinese Translation Differential Item Functioning

Article excerpt


Differences in test performance of population subgroups, and the issue of test fairness, have been of major interest since the late 1960's (Cole & Zieky, 2001). Differences in test performance of population subgroups are often explained by differential item functioning (DIF). DIF is a statistical procedure used to determine if test questions are fair and appropriate for assessing the knowledge of various population subgroups regardless of their gender, race or ethnicity. DIF test items perform differently for two groups of examinees after controlling for ability (Shepard, Camilli, & Averill, 1981). The identified DIF items in a test may pose a considerable threat to both the validity and the fairness of the test (Kim, 2001).

Fairness has been the priority in the field of educational testing during the past few decades (Cole & Zieky, 2001). For any given large-scale test, the evaluation of fairness is suggested as a standard procedure by American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) (1999). Test fairness has also become an important political and legal issue (Garcia & Pearson, 1994). Standards and guidelines for fair test practices (AERA, APA, NCME, 1999) require testing organizations and individual professionals to make tests as fair as possible for all test takers regardless of their gender, ethnic and linguistic backgrounds (Kunnan, 2000).

It has been repeatedly shown that DIF may arise from numerous sources such as gender (Kranzler & Miller, 1999; Mendes-Barnett, 2006), language, and ethnicity (Allalouf, Hambleton, & Sireci, 1999; Hauger & Sireci, 2008; Sammons, 1995). Gender-based DIF research has explored which gender group is favored by some items of a test (Hamilton, 1999; Mendes-Barnett, 2006; Sammons, 1995; Walstad & Robson, 1997). Language- and ethnic-based DIF studies are aimed at investigating performance differences on test items between different language and ethnic groups (Elosua & Lopez-Jauregui, 2007; Gierl, Rogers, & Klinger, 1999; Kim, 2001; Kunnan, 1990; Ong & Sireci, 2008; Sasaki, 1991).

In addition, more and more educational and cognitive tests have been increasingly translated and used in different linguistic and cultural contexts. When a test is translated from one language into another, the two tests are generally not psychometrically equivalent (Allalouf, 2003). DIF analysis can reveal these items whose psychometric characteristics are changed through translation (Allalouf, 2003). As a result, translation DIF, which is related to examinees' linguistic and cultural backgrounds, has led to a growing interest in fairness issues among measurement professionals (Allalouf et al., 1999; Gierl et al., 1999; Gierl & Khaliq, 2001).

DIF, Item Bias, and Item Impact

It is very important to distinguish among the following terms: DIF, item bias, and item impact. First, for a test to be acceptable, it must be fair to all subgroups of the population for which it is to be used. Both judgmental and empirical approaches are used to examine fairness issues (Popham, 2008). DIF analysis is an empirical approach that identifies significant difference in performance on a test item across two or more groups of examinees, after the examinees have been matched on the construct of interest (Wainer, Sireci, & Thissen, 1991).

McNamara and Roever (2007) argue that DIF is "a necessary condition but not sufficient condition for bias because a test item that functions differently for two groups might do so because it advantages one group in a construct-irrelevant way, but there might also be a legitimate reason for differential functioning" (p. 83). Therefore, once DIF is identified, it may be attributed to item bias or item impact.

Item bias results from a statistically significant difference across two or more groups of examinees due to characteristics of the item unrelated to the construct being measured. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.