Magazine article AI Magazine

Automated Essay Evaluation: The Criterion Online Writing Service

Magazine article AI Magazine

Automated Essay Evaluation: The Criterion Online Writing Service

Article excerpt

The best way to improve one's writing skills is to write, receive feedback from an instructor, revise based on the feedback, and then repeat the whole process as often as possible. Unfortunately, this puts an enormous load on the classroom teacher, who is faced with reading and providing feedback for perhaps 30 essays or more every time a topic is assigned. As a result, teachers are not able to give writing assignments as often as they would wish.

With this in mind, researchers have sought to develop applications that automate essay scoring and evaluation. Work in automated essay scoring began in the early 1960s and has been extremely productive (Page 1966; Burstein et al. 1998; Foltz, Kintsch, and Landauer 1998; Larkey 1998; Rudner 2002; Elliott 2003). Detailed descriptions of most of these systems appear in Shermis and Burstein (2003). Pioneering work in the related area of automated feedback was initiated in the 1980s with the Writer's Workbench (MacDonald et al. 1982).

The Criterion Online Essay Evaluation Service combines automated essay scoring and diagnostic feedback. The feedback is specific to the student's essay and is based on the kinds of evaluations that teachers typically provide when grading a student's writing. Criterion is intended to be an aid, not a replacement, for classroom instruction. Its purpose is to ease the instructor's load, thereby enabling the instructor to give students more practice writing essays.

Criterion contains two complementary applications that are based on natural language processing (NLP) methods. Critique is an application that is comprised of a suite of programs that evaluate and provide feedback for errors in grammar, usage, and mechanics, that identify the essay's discourse structure, and that recognize potentially undesirable stylistic features. The companion scoring application, e-rater version 2.0, extracts linguistically-based features from an essay and uses a statistical model of how these features are related to overall writing quality to assign a holistic score to the essay. Figure 1 shows Criterion's interface for submitting an essay, and figures 2 and 3 provide examples of its evaluations and feedback.


Critique Writing Analysis Tools

The Critique Writing Analysis Tools detect numerous errors under the broad headings of grammar, usage, and mechanics. The system also highlights potentially undesirable style--such as too much repetition. Finally, Critique identifies segments of essay-based discourse elements for the student. In this article, we describe those aspects of Critique that use NLP and statistical machine learning techniques.

Grammar, Usage and Mechanics

The writing analysis tools identify five main types of errors--agreement errors, verb formation errors, wrong word use, missing punctuation, and typographical/proofreading errors. Some examples are shown in table 1. The approach to detecting violations of general English grammar is corpus-based and statistical. The system is trained on a large corpus of edited text, from which it extracts and counts sequences of adjacent word and part-of-speech pairs called bigrams. The system then searches student essays for bigrams that occur much less often than is expected based on the corpus frequencies.

The expected frequencies come from a model of English that is based on 30-million words of newspaper text. Every word in the corpus is tagged with its part of speech using a version of the MXPOST (Ratnaparkhi 1996) part-of-speech tagger that has been trained on student essays. For example, the singular indefinite determiner a is labeled with the part-of-speech symbol AT, the adjective good is tagged JJ, the singular common noun job gets the label NN. After the corpus is tagged, frequencies are collected for each tag and for each function word (determiners, prepositions, etc.), and also for each adjacent pair of tags and function words. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.