Academic journal article Kuram ve Uygulamada Egitim Bilimleri

Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

Academic journal article Kuram ve Uygulamada Egitim Bilimleri

Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

Article excerpt

Psychological and educational tests are frequently used to explore individual academic performances, educational needs, and curriculum assessment. Results that are obtained from these tests form the basis for critical decisions to get to know individuals, to employ or place them in institutions or schools, and to select, guide and assess people. As a result, it is essential to prove empirically that test scores have high validity and reliability. What is more, ongoing decisions taken by individual or organizational test developers, practitioners, and interpreters according to test scores depend on developing and implementing eligible methods to examine test development and psychometric qualifications (Camilli & Shepard, 1994; Holland & Wainer, 1993).

While carrying out large-scale assessments in Turkey, such as Transition to Higher Education Examination (YGS) and Public Personnel Selection Examination (KPSS), test forms are produced with different item orders for each examination, and these forms are presented as "personally identifiable booklets." This is considered to have a high potential in giving adverse outcomes for examinees although the primary reason for such an application is to prevent cheating in examinations. The approach shown in the item sequencing process for a test form has a sequence of easy-to-hard questions. In other words, starting a test with easy items or increasing the difficulty level of items through the test is a general principle in measurement and evaluation. Disregarding the principle might lead to outcomes such as increased anxiety of examinees, loss of self-confidence, and disturbance of mental integrity. Taking test forms of different item orders, for instance, may cause examinees to have different anxiety levels: an easy-to-hard test might cause lower anxiety levels than a hard-to-easy test. Another potential problem is the disturbance of content integrity that is caused by different item orders and obstructs mental process of examinees, shortens time of concentration, and hampers focusing on tests. This leads to lower motivation and self-esteem disturbance, and thus, adversely influences test performance (Ankara University, 2011).

The application of "personally identifiable booklets" in nation-scale examinations in Turkey may produce negative outcomes, particularly for psychometric qualifications of tests and for test takers, such as "breaking of the equivalence principle in examinations," "negatively influenced psychology of examinees," and "low performance." Therefore, such examinations can cause biased measurement results both in favor and disfavor of examinees. As the item orders are different due to variance in forms, this alone could cause students to consider items "harder" or "easier" (Balch, 1989; Impara & Foster, 2006; Laffitte, 1984; Pettijohn & Sacco, 2007). Some reviews of test item order studies have concluded that item order does not influence student test performance (Barcikovski & Olsen, 1975; Carlson & Ostrosky, 1992; Gerov, 1980; Klosner & Gellman, 1973; Tippest & Benson, 1989). However, when research on item order in test forms is reviewed, it is clear that most studies have been based on Classical Test Theory (CTT). Some studies concluded that test item order influences test scores, item parameters, and completion time (Balch, 1989; Picou & Milhomme, 1997).

Two issues still important in measurement and evaluation are a comparison between scores from different test forms that have been used to measure the same quality and the exploration of similar functioning of items/tests in different sub-groups. Research on differential item function (DIF) is needed to study and handle the issues further. DIF means the probability differentiation of a correct response to a given item by people in different groups because of a characteristic irrelevant to the measured construct (e.g., gender, ethnicity, and so on) in a comparison between ability levels to be measured by an item (Zumbo, 1999). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.