Applications of Computers
in Assessment and Analysis of Writing
Mark D. Shermis, Jill Burstein, and Claudia Leacock
It has been almost 40 years since Ellis "Bo" Page prophesized the imminence of grading essays by computer. As a former high school English teacher, he envisioned a world where computers could assist in reducing the burden of grading written English. Later, as a seminal educational researcher, his desire was to implement a system that would operationalize what 50 years of research has shown us: Students become better writers by writing more. His landmark article in Phi Delta Kappan (Page, 1966) forecast this, but it took another 7 years to produce a working model (Ajay, Tillett, & Page, 1973) using FORTRAN code and a large mainframe computer. The result was Project Essay Grade (PEG). In order to submit text to the essay grader, written documents had to be transferred to awkward IBM 80-column punched cards, an overwhelming task for the technology of the day. Beyond this handicap, the technology performed as well or better than the ratings assigned by humans (Ajay et al., 1973).
In the early 1990s, Page refashioned PEG with a more sophisticated parser, real-time calculations, and a Web-based interface (Shermis, Mzumara, Olson, & Harrington, 2001). The details of this new format are explored further in this chapter, demonstrating the way automated essay scoring (AES) works, the kinds of software programs it uses, and some emerging applications.
AES is the evaluation of written work via computers. Initial research restricted AES to English but has recently extended to Japanese (Kawate-Mierzejewska, 2003), Hebrew (Vantage Learning, n.d.), Bahasa (Vantage Learning, 2002), and other languages. The interfaces are predominantly Internet-based, though there are some implementations that use CD-ROMs.
Most packages place documents within an electronic portfolio. They provide a holistic assessment of the writing, which can be supplemented by trait scores based on an established rubric, and may provide qualitative critiques through discourse analysis. Most use ratings from humans as the criterion for determining accuracy of performance, though some of the packages will permit validation against other sources of information (e.g., large informational databases).
Obviously, computers do not "understand" written messages in the same way that humans do, a point that maybeunnerving until one reflects on ways alternative technologies achieve similar results. Thus, cooking was once associated primarily with convection heating, a form of heating external to the food. But by thinking "outside the box," we can see that the same outcome can be achieved by a technology not based on convection, but on molecular activity within the uncooked items (i.e., the microwave oven).