Academic journal article Studia Anglica Posnaniensia: international review of English Studies

Differences across Levels in the Language of Agency and Ability in Rating Scales for Large-Scale Second Language Writing Assessments

Academic journal article Studia Anglica Posnaniensia: international review of English Studies

Differences across Levels in the Language of Agency and Ability in Rating Scales for Large-Scale Second Language Writing Assessments

Article excerpt

I. Introduction

While the literature on language testing and writing assessment is rich with studies evaluating the validity and reliability of given assessments, a relatively smaller body of literature explores the actual language of writing rubrics themselves. In this study, writing rubrics may be understood as demonstrating the same range of features and functions described by Covill (2012): "a list of criteria that are relevant to producing effective writing", generally featuring multiple levels with descriptors, used for rating, placement, instruction, or a combination of these functions. (1) In the case of large-scale assessments of second language writing, such as those that are part of the Test of English as a Foreign Language (TOEFL) or International English Language Testing System (IELTS) exams, raters may experience "tension" between the language of the performance descriptors in the rating scale and their own "intuitive impression" of a given learner text, a tension that is addressed--though not fully resolved--by rater training (Lumley 2002: 246). Lumley goes on to argue that "[r]ather than offering descriptions of the texts, the role of the scale wordings seems to be more one of providing justifications on which the raters can hang their scoring decisions" (Lumley 2002: 266). Further, trained raters are not the only audience for these scales. In an effort to educate test takers, teachers, and schools about their tests, Educational Testing Service (ETS) and Cambridge English Language Assessment, as well as other testing agencies, provide public versions of their rubrics online. Thus the language of these rubrics is available to a general audience, who may use it to prepare for the exam, to borrow or adapt for their own assessments, or to consider in admissions decisions.

Some of this literature that looks at the language of performance descriptors focuses on the content of the rubrics. For instance, Matsuda & Jeffery (2012) analyze (lack of) attention to voice in performance descriptors for writing assessments on national English proficiency tests and standardized tests of college readiness. Jeffery (2009) also provides a content analysis of large-scale writing assessment rubrics and a syntactic analysis of prompts.

The language of performance descriptors, including syntactic structure, is frequently referenced in language and writing assessment literature, but remains a challenging area. The limitations associated with distinguishing between levels exclusively with adjectives and adverbs of degree are frequently mentioned, e.g., Hawkey & Barker (2004: 127), Knoch (2011: 82). Other scholarship on rating scales considers the level of specificity and detail of the descriptors, e.g., Knoch (2007: 121), Brindley (1998), and Upshur & Turner (1995). Specific attention to verbs also appears in the "can do" statements of the Common European Framework of Reference for Languages (CEFR), and the language of performance descriptors is again referenced as North (2007) reflects that all the descriptors for the CEFR are "worded in positive terms, even for lower levels" (North 2007: 657). However, the language of performance descriptors in rating scales continues to be an issue for language testing and writing assessment; Kuiken & Vedder (2014: 283) identify "the potentially multiple interpretation and vagueness of some of the scale descriptors" as an area for future research. Alderson et al. (2004) discusses a similar issue in describing levels of reading and listening in language testing.

However, scholarly accounts of the development of rating scales for writing, including the creation and revision of performance descriptors, are limited. In introducing his corpus study of the language of first-year college writing rubrics, Dryer (2013: 5) asserts that "[l]ittle is known about how scales themselves are composed, and few field-tested recommendations for scaling performance categories exist as of this writing". …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.