Putting Machine Testing to the Test: Next-Generation Educational Standards Meet Next-Generation Scoring Methods, but with Controversy

Article excerpt

The goal of raising academic achievement in the United States has led to a number of remedies, ranging from the No Child Left Behind Act (2001) to the Common Core State Standards (2010). This has meant not only more testing, but also more-complex testing of students. In order to keep up with grading these tests, a growing trend is to use machine scoring--even on the essay portion of standardized tests.

The need to do something about declining and mediocre standardized scores means that future testing and assessment in K-12 will require more-complex tasks than the current selected-response method, where students can derive correct answers by guessing. New testing methods will utilize more performance-based tasks and constructed responses. Essay questions will require students to demonstrate independent thinking, selecting and organizing information from provided references, and using reasoning skills for development.

[ILLUSTRATION OMITTED]

Since 2005, the Scholastic Aptitude Test (SAT) for college entrance has included an essay section to measure writing skills. Graduate and professional schools are also utilizing writing as part of the admissions process. A more recent phenomenon is universities experimenting with Massive Open Online Courses (MOOCs), some with more than 100,000 students. Testing in these educational programs generates an extremely large number of essays.

All of these trends--new educational standards, compulsory statewide student assessments, the college admissions process, and the emergence of MOOCs--have created a new business model in testing, and several scenarios could potentially develop for assessing students.

The market for standardized educational testing has led to commercial ventures and has become a multimillion-dollar business. Through bidding, educational testing companies have entered into contracts with individual states and private schools using thousands of readers.

For overall student assessment, grading essays is the most expensive component of standardized educational testing. For the last three decades, essay scoring has relied on human readers, who have college degrees from different fields, have demonstrated writing ability, and are qualified through scoring-agreement rates with other readers in practice sets.

In addition to paying readers for training and scoring essays, educational testing companies develop rubrics that address which features they want to score and the characteristics of each score on a scale of 1 to 6 points. Assessing essays also requires the time-consuming process of developing prompts that pose essay questions.

With the demands on teachers' time, the need for quick feedback, and the labor costs for human scoring, educational researchers need to determine the most efficient way to assess standardized essays. From a business perspective, using artificial intelligence dramatically reduces the cost and time required to evaluate student writing. The algorithm developed by Educational Testing Service for the General Management Aptitude Test can score 16,000 essays in 20 seconds.

Given the efficiency of AI scoring, it is unlikely that using only human readers will make business sense or satisfy the requirements for next-generation educational assessment.

Yet, technological innovation is often disruptive. AI evaluators unfortunately will displace thousands of reader jobs. On a positive note, this disruption in the existing market creates new opportunities in computer programming, artificial intelligence, linguistics, business development, psychometrics, and Web design.

The History and Future of Machine Scoring

In 1967, former high school English teacher Ellis Page developed Project Essay Grade (PEG), the first successful automated essay scoring system, but it required IBM punch cards and mainframe computers. In the 1990s, this would all change with advances in computing, AI, and natural language processing. …