Performance Assessment and Education Reform

Regardless of the value of performance assessments in the classroom, a measurement-driven reform strategy that relies on performance assessments to drive curriculum and instruction seems bound to fail, Mr. Haertel maintains.

BEGINNING with the publication in 1983 of A Nation at Risk, a stream of reports and pronouncements has fueled the popular perception that the U.S. education system is in crisis.1 Real and imagined declines over time in performance on tests such as the SAT and the National Assessment of Educational Progress (NAEP); cross-national comparisons on tests including the International Assessment of Educational Progress and, more recently, the Third International Mathematics and Science Study; and comparisons to benchmarks such as the NAEP achievement levels established by the National Assessment Governing Board have been publicized as evidence that our educational problems continue unabated. The use of test scores to index educational success or failure is almost never questioned. Low scores are bad news; high scores are good news. In the rhetoric of education reform, it often sounds as if improving the education system is synonymous with improving test scores.

In such a climate, the logic of high-stakes testing seems compelling. Test students and see what they can do. Hold them or their schools accountable if they fail to make the grade. Rather than micro- manage schools, policy makers can dictate that content standards and performance standards be created to codify expected learning outcomes and then let teachers and school administrators determine how best to attain those outcomes.

It sounds like a rational management plan. If there are clear expectations, teachers will know what they are supposed to teach, students will see how hard they must work to make the grade, and taxpayers will know whether their schools are measuring up. If the standards are appropriate, if students and teachers are prepared to accept the challenge of meeting them, if the phase-in period for accountability is realistic, if reliable and valid tests are available to ascertain the extent of students' mastery, if teachers have the requisite knowledge and training to help students meet the challenge of new standards, if schools are not hobbled by extraneous demands and requirements, if necessary instructional materials and resources are available, if out-of-school factors are given appropriate consideration . . . then a measurement-driven accountability system ought to show just which students are working and which ones are slacking off, which teachers and schools should be rewarded and which ones should be punished.

It is not hard to understand why accountability testing is popular with policy makers. Testing enjoys broad popular support.2 Calling for more or higher-stakes testing is a visible, dramatic response to public concerns about education. Moreover, the idea that demanding higher test scores will improve schooling carries with it the not-too-subtle implication that students, teachers, and administrators just aren't trying hard enough. If efforts are redoubled, scores will rise. Proposing a new testing plan diverts attention from the problems alluded to by all those "ifs," including conflicting curricular expectations, inadequate teacher preparation, inadequate teaching materials and facilities, and the changing demography of the student population.3 Attacking those other problems is likely to take a lot of time and money, but calling for another new test costs next to nothing. Moreover, a new test can be implemented quickly, before the terms of current officeholders expire. Scores on an unfamiliar test are likely to be poor at the beginning and then to rise in years two and three. As Robert Linn observes, "The resulting overly rosy picture that is painted by short-term gains observed in most new testing programs gives the impression of improvement right on schedule for the next election. …