Academic journal article International Journal of Education

Revisiting Differential Item Functioning: Implications for Fairness Investigation

Academic journal article International Journal of Education

Revisiting Differential Item Functioning: Implications for Fairness Investigation

Article excerpt

Abstract

Fairness has been the priority in educational assessments during the past few decades. Differential item functioning (DIF) becomes an important statistical procedure in the investigation of assessment fairness. For any given large-scale assessment, DIF evaluation is suggested as a standard procedure by American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. This procedure often affords opportunities to check for group differences in test performance and investigate whether or not these differences indicate bias. However, current DIF research has received several criticisms. Revisiting DIF, this paper critically reviews current DIF research and proposes new directions for DIF research in the investigation of assessment fairness.

Keywords: differential item functioning; item bias; item impact; assessment fairness

(ProQuest: ... denotes formulae omitted.)

1. Introduction

Fairness has been the priority in educational assessments during the past few decades (Cole & Zieky, 2001). According to American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) (1999), educational organizations, institutions, and individual professionals should make assessments as fair as possible for test takers of different races, genders, and ethnic backgrounds. Related to assessmentfairness is the term Differential Item Functioning (DIF). Lord (1980) provided the following definition forDIF: If each test item has exactly the item characteristic curve in every group, then people of the same ability would have exactly the same chance of getting the item tight, regardless of group membership ...If, on the other hand, an item has a different item characteristiccurve for one group compared to another, it is clear the item is functioning differently across groups.

DIF research has received increased attention in educational and psychological contexts(Camilli & Shepard, 1994; Clauser & Mazor, 1998; Dickinson, Wanichtanom, & Coates, 2003; Gierl, Rogers, & Klinger, 1999; Huang & Sheeran, 2011). For any given large-scale assessment, DIF evaluation is suggested as a standard procedure by AERA, APA, and NCME (1999).This procedure often affords opportunities to check for group differences in test performance andinvestigate whether or not these differences indicate bias.

However,many concerns have been raised about current DIF research. For example, there are severalstatistical methods for detecting DIF; but these methods do not yieldconsistent and stable results(Gierl et al., 1999). Further, many DIF research studies did not include the judgmental analysis (Camilli &Shepard, 1994). These concerns may limit the interpretation of the impact of DIF on test development. Therefore, it is important to revisit DIF and provide implications for research and practice in the area of investigating fairness in educational assessments.

Understanding the concept of DIF is the first step; the next is to discussthe procedures of detecting DIF; and finally it is to explore why DIF occurs and provide implications for fairness investigation in educational research. This paper first introduces somebasicterms and their definitions.It then compares and contrasts various statisticalprocedures of detecting DIF. After that it summarizescurrent research results on the interpretation and explanation of DIF. In the following sectionsit criticizescurrent DIF research and proposes new directionsfor DIF researchin the investigation of assessment fairness.

2. DIF, Item Bias, and Item Impact

DIF occurs when an item is substantially harder for one group than for another group afterthe overall differences in knowledge of the subject tested are taken into account. Therefore, DIF refers to the ways in which items function differently for individuals or groups of test takers with similar abilities(Kunnan, 1990). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.