This study aims at comparing the difficulty levels, discrimination powers and powers of testing achievement of multiple choice tests and true-false tests, and thus revealing the rightness or wrongness of the commonly believed hypothesis that multiple choice tests don't bear the same properties as true-false tests. The research was performed with 252 fourth year students studying for a degree in teaching in the 2007-2008 academic year. This is a descriptive study. The research data were obtained through a 100-item test of 50 multiple-choice and 50 True-False questions, which were prepared as equivalent, and given at the end of semesters. And the data were analyzed through the SPSS programmed. The research results are significant in that they show that true-false tests are not easier than multiple-choice ones, and that probable success stemming from the test structure does not contribute significantly to the difference in general.
Key Words: Multiple choice test, true-false test, difficulty coefficient, discriminating coefficient, reliability.
Achievement tests attempt to measure what an individual has learned his or her present level of performance. They are used in diagnosing strengths and weaknesses and as a basis for awarding prizes, scholarships, or degrees. Many of the achievement tests used in schools are nonstandard zed, teacher-designed tests (Best & Kahn, 2006, p. 301).
Multiple-choice and true-false tests have a valuable role in higher education (Burton 2005, p. 65). When we think of measuring achievement, the natural tendency is to think of published standardized, multiple choices, true-false, or fill-in tests, all which rely on written test items that are read and answered correctly or incorrectly by the examinee (Stiggins, 1992, p. 211). It is much discussed fact that marks in multiple choice and true-false tests may be obtained by guessing. However, the actual extent to which chance affects scores thereby is too little appreciated (Burton & Miller, 1999, p. 399). The assessment of student learning is an important issue for educators (Madaus & O'Dwyer, 1999). Since the early 1900s, traditional multiple choice (MC) item formats have achieved a position of dominance in learning assessment, mainly due to the prima facie objectivity and the efficiency of administration this format represents. However, the popularity of the MC format has come under scrutiny for some applications where accuracy of assessment, particularly for complex knowledge domains, has greater importance than efficiency (Becker & Johnston, 1999; Bennett, Rock, & Wang, 1991). Traditional MC testing formats offer efficiency, objectivity, simplicity, and ease of use for the assessment of student knowledge, but are subject to many sources of interpretation error (Swartz, 2006, p. 215).
According to Traub (1991) no measurement is perfect. Chance may affect scores in multiple choice and true-false tests in two ways. First, if the questions sample only part of the examinable subject matter, then a particular examinee may be lucky or unlucky in the examiner's choice of questions (Posey, 1932). Second, marks may be obtained by guessing. Test reliability is discussed in many textbooks. But few give the reader a quantitative feeling for the inherent unreliability of particular tests that is due to chance (Burton, 2001, p. 42). Although number-right scores can undoubtedly be considerably raised by guessing, it is a common belief that guessing is unimportant, or at least guessing that is completely random ('blind'). When multiple-choice tests began to widely used, they were critized because examinees could answer correctly by guessing. Many educators viewed any score gain from guessing as ill gotten (Fray, 1988; Burton, 2004). True-false tests are limited to testing for factual recall in general opinions of educators. But, Ebel (1979) demonstrates clearly that true-false items can be made to present quite difficult and complex problems. …