An Empirical Investigation of the Effects of Three Methods of Handling Guessing and Risk Taking on the Psychometric Indices of a Test

Article excerpt

This study examines the effect of three scoring methods (number-correct, discouraging guessing, and the partial knowledge award) on the psychometric indices (reliability and validity) of a test, given examinees' risk-taking level. One hundred and twenty undergraduate students in a psychology research methodology class served as the sample. A 40-item multiplechoice test with 4 responses per item was used to assess the effect of different scoring methods on test reliability and validity, and a test of 10 nonsense items was used to classify the examinees into high risk-taking and low risk-taking groups. The results showed that the 3 methods produce different reliability and validity coefficients, with the partial knowledge method choice.

Ever since multiple-choice tests first became widely used in the 1920s there has been concern over the fact that guessing affects the scores on these tests (Frary, 1996; Lord & Novick, 1968; Mehrens & Lehman, 1978; Nitko, 1983; Nunnally, 1978). Because of guessing, the structure of the score matrix in the case of the number of ones (where items are scored 1 for a correct response and 0 for an incorrect response) will be changed. Consequently, the items' frequencies of correct responses and the examinees' test scores, and hence the total test variance, will also be affected (Crocker & Algina, 1986; Magnusson, 1967; Nunnally, 1978).

At first, the effect of guessing was not well understood, and score increases due to guessing were uncritically viewed as ill-gotten gains even though these score components usually reflected partial knowledge (the ability to eliminate some wrong choices before guessing) (Frary, 1996; Nunnally, 1978). Specifically, examinees have a higher probability of responding correctly by guessing for items which they would otherwise have been unable to solve (Crocker & Algina, 1986; Frary, 1996; Magnusson, 1967; Nitko, 1983).

When evaluating the consequence of guessing on the reliability and validity of a test, it is crucial to distinguish between systematic and random errors that could be produced by guessing. Systematic variance in test scores and in the criterion test scores may influence the validity coefficient but does not affect the reliability. However, if the error variance is not systematic, it will affect both validity and reliability indices (Magnusson, 1967; Nunnally, 1978).

The reaction of some educators to the effects of guessing is to discourage students from all guessing by directly or indirectly condemning it as dishonest. Of course, admonishing students for guessing is ineffective as well as unfair to those who refrain, as long as tests are scored on the basis of the number of correct answers. As a result, many educators have avoided using multiple-choice tests even when it was known that these tests frequently make objective scoring accurate and easier, provide better coverage of the instructional topics, and make statistical feedback at the item level possible.

To enhance test reliability, users are advised to include more items in the test and to increase the number of alternatives per item, but that is sometimes impracticable, especially with university or high school examinations where tests are timed (Burton, 2001). Another way to increase reliability is to reduce guessing. Generally, minimizing the existence of guessing can be done by applying either (1) a scoring formula that discourages guessing, such as the number-correct scoring method, which subtracts a fraction of the wrong answers from the numbercorrect score, or (2) awarding partial credit for omitted questions rather than deducting credit for incorrect answers. The latter approach has a psychological advantage over the former method since it rewards the desired behavior, not guessing completely at random, rather than penalizing undesirable behavior (Crocker & Algina, 1986; Frary, 1996; Nunnally, 1978).

Reviewing some empirical studies concerning methods of scoring test items, it has been found by Diamond and Evans (1973) that there were no significant effects on reliability indices as a function of how guessing was handled. …