Academic journal article Psychological Test and Assessment Modeling

Actual Type-I- and Type-II-Risk of Four Different Model Tests of the Rasch Model

Academic journal article Psychological Test and Assessment Modeling

Actual Type-I- and Type-II-Risk of Four Different Model Tests of the Rasch Model

Article excerpt

Introduction

In the meantime, it is well-known, that only if the Rasch model holds for a psychological (achievement) test the sum of correct given answers is a sufficient estimator of the person's ability (Fischer, 1995). In order to test the Rasch model there are several model tests available, but their actual type-I- and type-II- risks are rarely examined. To help for choosing the best model test (and designing the needed sample size) when an achievement test shall be constructed, in this paper simulation studies were done to compare four model tests regarding their type-I-risk and their power: Andersen's Likelihood-Ratio test (Andersen, 1973), the z-Test of Fischer and Scheiblechner (1970) with estimation of Wald (1943) (first applied by Fischer und Ponocny-Seliger, 1998), the Martin-Löf test (Martin-Löf, 1973) as well as the approach of Kubinger, Rasch, and Yanagida (2009) who proposed to use a three-way nested analysis of variance. Except for the last one, all test statistics are only asymptotically χ2-distributed in case of the null-hypothesis, so that the type-I-risk might not hold; and there is no formula to calculate the type-II-risk, given the type-I-risk, the sample size and the effect size. The only way to determine the typeII-risk is via simulation. Therefore the model tests were compared for different kinds of model violations as well as for the case the null-hypothesis is true.

Alexandrowicz (2002) showed in a simulation study that the power of Andersen's Likelihood-Ratio test is more influenced by the sample size than it is by the kind of model violation when using the internal split criterion: "high vs. low score". Andersen's Likelihood-Ratio is constructed to test whether there applies some differential item functioning (DIF) regarding either an internal or an external criterion for splitting the sample of tested persons. It is a global model test. The z-Test of Fischer and Scheiblechner (1972) with the estimation of Wald proves for each item separately whether there is a DIF or not; while the Martin-Löf test (Martin-Löf, 1973) proves, whether two hypothesized subgroups of items measure the same ability (dimension). A simulation study of Verguts und De Boeck (2000) showed that the number of items affects the type-I-risk: Type-I-risk decreases with an increasing number of items (and small sample size of tested persons). For 24 items and 5000 persons the type-I-risk was 0%.

Kubinger, Rasch, and Yanagida (2009; see also Kubinger, Rasch, & Yanagida, 2011 as well as Rasch, Kubinger, & Yanagida, 2011) proposed a new method to test the Rasch model with regard to an external split criterion, with the purpose of calculating proper sample sizes for given type-I-risk, power and effect size. They suggested to use a three-way (nested) analysis of variance for mixed classification [i.e. (A f B) x C]. A is a fixed factor and splits the data into two groups of tested persons. B represents the persons and is a random factor. It is nested in A because each person is assigned to only one group of A. (A f B) is cross-classified with C, which represents the items. Given H0: there is no interaction effect A x C, means specific objectivity holds and therefore the Rasch model. If the respectiv F-Test of the interaction term A x C is significant, the null hypothesis must be rejected because the data don't confim the Rasch model.

One problem of this approach is that the assumption of normal distribution is violated, because the data are dichotomous. Besides this, there is only a single observation in each cell. Hence it is necessary to test via simulation studies whether both these facts affect type-I- and type-II-risk. The authors showed that the actual type-I-risk is close to the nominal risk, given there is no main effect of A. Is there however a main effect of A, then the type-I-risk is artificial too high for the interaction effect A x C. In a second paper (Kubinger, Rasch, & Yanagida, 2011) the authors showed restrictively that the type-I-risk of the interaction effect AxC, given there is no main effect A, grows with increasing sample size and longer test length, as well as with a greater range of the item parameters. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.