Academic journal article Psychological Test and Assessment Modeling

Incorporating Different Response Formats of Competence Tests in an IRT Model

Academic journal article Psychological Test and Assessment Modeling

Incorporating Different Response Formats of Competence Tests in an IRT Model

Article excerpt

International large-scale assessments as well as national studies on students' achievement have to deal with the challenge of efficiently and precisely measuring different competencies of the participants. When operationalizing theoretical constructs of the competencies to be measured, one relevant issue refers to the choice of the items' format. To increase strengths and compensate weaknesses of each format, Martinez (1999) recommended a combination of item formats in test instruments. Taking validity and variation into account, competence tests in (large-scale) assessments, for example the Program for International Student Assessment (PISA), the Trends in International Mathematics and Science Study (TIMSS), the National Assessment of Educational Progress (NAEP), or the National Educational Panel Study (NEPS), hence usually contain different response formats to comprehensively assess the subjects' competencies (Allen, Donoghue, & Schoeps, 2001; OECD, 2012; Olson, Martin, & Mullis, 2008).

A common classification of item formats is the differentiation between selected-response (SR) and constructed-response (CR) formats (Haladyna & Rodriguez, 2013; Osterlind, 1998). SR items consist of correct and incorrect options to a problem and require the examinee to select one or several options. In CR items no options are presented, but the examinee has to generate the answer usually by writing down a word or short sentences. McMillan (2000) outlined that in comparison to CR formats such as essays, oral questions, or observations, SR items have the broadest spectrum in measuring competencies and skills. As SR formats are the most widely used item types in achievement tests of large-scale studies (Bleske-Recheck, Zeug, & Webb, 2007; Osterlind, 1998), in the following we focus on the common SR formats.

The two most well-established types of SR items in competence tests are multiple choice items and true-false items (Osterlind, 1998). The well-known multiple choice (MC) item encompasses an item stem, that is a question or an incomplete sentence, and different choices of responses, most conveniently four or five options comprising the correct answer and wrong answers, the so-called distractors (Haladyna & Rodriguez, 2013). True-false items are a popular variation of the MC format and require the examinee to make a binary choice (Haladyna, 1992). Often, true-false items are arranged to complex multiple choice (CMC) items that include a number of "true/false" statements. CMC items are, for instance, applied, in the PISA or NEPS study (Adams & Wu, 2002; Pohl & Carstensen, 2013). Note that the term complex multiple choice item is not used consistently in the literature. In recent large-scale studies such as PISA or NEPS it denotes multiple true-false items, while other researchers used the term slightly different for MC items with response options in which combinations of correct answers are offered (e.g., Haladyna & Rodriguez, 2013; Scalise & Gifford, 2006). In the following, we refer to CMC items as items including several binary subtasks as a synonym to multiple truefalse items.

So far, large-scale studies have varied in their incorporation of MC and CMC item response formats for scaling the competence data. However, there is only little research on how the two response formats can be treated adequately in a scaling model. Specific questions that arise when implementing the response formats in a scaling model are: Do MC and CMC items measure the same latent trait? What impact should MC and CMC items have on the overall competence score? Should they be weighted equally in the scaling model? Should CMC items with more subtasks have a larger impact on the overall competence score? The purpose of the present study was to approach these questions by compiling theoretical considerations about the response formats and by thoroughly analyzing empirical data. Through a systematic investigation of the questions concerning dimensionality and weighting on a variety of competence tests we aimed at delineating implications for implementing the two response formats in a measurement model. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.