Academic journal article Higher Education Studies

Teacher Interpretation of Test Scores and Feedback to Students in EFL Classrooms: A Comparison of Two Rating Methods

Academic journal article Higher Education Studies

Teacher Interpretation of Test Scores and Feedback to Students in EFL Classrooms: A Comparison of Two Rating Methods

Article excerpt


Rating scales have been used as a major assessment instrument to measure language performance on oral tasks. The present research concerned, under the same rating criteria, how far teachers interpreted students' speaking scores by using two different types of rating method, and how far students could benefit from the feedback of the description of the two rating methods. Fifteen English teachers and 300 college students in Taiwan participated in the study. Under the same rating criteria, the two types of rating method, one with the level descriptors and the other with a checklist, were used by the teachers to assess an English speaking test. It was discovered that the rating method had a noticeable impact on how teachers judged students' performance and interpreted their scores. The majority of the students considered feedback from the verbally more detailed method more helpful for improving language ability.

Keywords: rating scale, rating checklist, role-play, speaking, reflection

1. Introduction

1.1 Overview

Performance rating scales have long been used to provide information regarding test candidates' performance abilities in speaking or writing. The aim of using rating scales to interpret candidates' language ability is to diminish the drawback of low reliability in holistic scoring, by incorporating a number of relevant linguistic aspects to help reduce the problem of biased or unfair judgment by scorers (Hughes, 2003). According to my teaching experience in Taiwan, however, many university teachers have regarded using rating scales with detailed descriptors in speaking tests as a waste of time. They often turned to holistic scoring, where they simply gave students single scores based on their overall speaking performance, but later they discovered it was hard to interpret students' scores after marking and to know how far the students had achieved the teaching and learning objectives. This resulted in problems with verbalizing students' performance based on their scores and offering informative feedback to students, the teachers themselves, and other relevant stakeholders. Rating methods have been developed and revised to use under different assessment circumstances. The present study examines Taiwanese university teachers' perceptions of using two different rating methods to interpret their students' oral scores in role-play and simulation tasks in an English course. The aim was to discover firstly, whether using the same test criteria, the formats of the two rating methods influenced the teachers' marking, and if so, how much, and secondly, which type of rating they thought would be useful for them to reflect the teaching. In addition to teachers' usage of rating methods, the students were asked which rating method helped them better reflect on their speaking performance after receiving feedback from both. It is hoped that the results of the present study can provide teachers and educational researchers with guidance on whether to employ either rating method in the context of classroom assessment and feedback to students.

1.2 Rating Scale for Assessing Language Performance

Rating scales for performance assessment have been designed in various forms depending on whether the research interest relates to the student's underlying ability, or the purpose of the test users (Hamp-Lyons, 1991; Alderson, 1991). The most common form of rating scale is called a "performance data-driven scale", where a separate score for each of a number of language aspects of a task, say, grammar, vocabulary, or fluency, is given to a student and the score from each linguistic component is added up as a total score to represent the student's performance on the task. When constructing the scale, samples of learner performance undertaking test tasks in specific language contexts need to be collected, following the transcription, and identification of key performance features by discourse analysis (Fulcher, 2003; Fulcher, Davidson, & Kemp, 2011). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.