A Correlational Analysis of Multiple-Choice and Essay Assessment Measures
Kniveton, Bromley H, Research in Education
Increases in student numbers and cutbacks in resources over recent years have encouraged academics in Britain to revisit traditional views of their teaching. Initial reactions have concentrated on teaching methodology and class sizes. Tutorials have grown in size and been turned into seminars or discontinued altogether, and reading weeks and projects have come into more widespread use. In addition the student profile is changing rapidly, with approximately 40 per cent of undergraduates now classified as `mature students'. The introduction of Teaching Quality Assessment has added a further dimension. Academics are typically reviewing the ideological bases of their courses and determining the aims and objectives put before their students. A central aspect of most courses is assessment. Two questions can legitimately be raised, one concerning whether it is as efficient in resource utilisation as it could be and the other as to whether it actually measures whether the student has achieved the objectives. There are, however, other concerns which should not be forgotten. Assessment can provide students with feedback, which is academically desirable, and it can also, according to Michael (1991), be an extremely potent means of motivating and directing students.
Essays are traditionally accepted in many disciplines; they are quick and easy to set but they take a great deal of time to mark, and increasing numbers can make this a serious problem. Alternatives, however, such as multiple-choice tests are regarded by many with distaste. This is surprising in the light of the long history of educational use of objective testing. Objections include the use of recognition rather than recall and the rather dubious argument that reasoning and understanding cannot be measured. It has also been argued that not all students can afford to buy the textbooks on which questions are based and that multiple-choice questions are far more difficult to create than essay-type questions. It is intended in this article to examine the relationship between grades given to students for essays and multiple-choice tests. A long tradition of research has outlined the problems associated with marking essays. In a classic study sixty years ago Hartog and Rhodes (1935) found that experienced examiners were remarkably unreliable in their essay marking. This type of finding has been continually repeated over the years, as is most recently illustrated by Newstead and Dennis (1994), who highlighted the low level of reliability between essay markers. Newstead and Dennis suggest that one solution to the reliability problem would be to increase the number of objective assessments and they specifically mention multiple-choice testing. This style of assessment may also go some way towards compensating for the other limitations of the essay-style test which they do not mention. Gronland (1968), for instance, pointed out that the use of essays as an assessment can provide only a very limited sampling of attainment, with a small number of questions being included in any one test, and found that students who are lucky enough to select the right topics to revise tend to do best. Indeed, Miller and Parlett (1973) found that students who were `cue-seekers' gained the best degrees and `cue-deaf' students gained the poorest.
There is, however, research which supports the idea that the method of testing is not very important. Jacobson (1990), for instance, found that both short-answer and essay tests in coursework had comparable predictive validity for final examination performance. Stacy et al. (1990) found when examining self-reports of smoking behaviour that there was a high correlation between open and closed-ended questions with limited options. Barnard and Ehrenberg (1990), looking at consumer brand beliefs, found that three measures-free choice, scaling and ranking-yielded high correlations. How far these last two studies can be generalised to academic assessment is, of course, open to debate. …