Do High-Stakes Tests Improve Learning? Test-Based Incentives, Which Reward or Sanction Schools, Teachers, and Students Based on Students' Test Scores, Have Dominated US. Education Policy for Decades. but a Recent Study Suggests That They Should Be Used with Caution and Carefully Evaluated
Hout, Michael, Elliott, Stuart, Frueh, Sara, Issues in Science and Technology
The United States has long performed at a middling level on international assessments of students' math, reading, and science knowledge, trailing many other high-income countries. In their efforts to improve K-12 education, U.S. policymakers have increasingly turned to offering incentives--either to schools, to teachers, or to students themselves--to increase students' standardized test scores.
For example, the No Child Left Behind (NCLB) law, which has governed public education for more than 10 years, sanctions schools whose students do not perform well on standardized tests. More recently, states and school districts have experimented with awarding bonuses to teachers if their students' test scores climb. Twenty-five states target the incentives to students themselves by requiring them to pass an exit exam before receiving their diploma.
All of these policies share a fundamental principle: They reward or sanction students, teachers, or schools based on how well students score on standardized tests. Policymakers hope that by holding various players in the education system accountable for how much students learn, they will be motivated to improve student performance. But do test-based incentives actually drive improvements in student learning?
In an effort to answer that question, a recent study by the National Research Council took a comprehensive look at the available research on how incentives affect student learning. The study committee, composed of experts in education, economics, and psychology, examined a range of studies on the effects of many types of incentive programs. What it found was not encouraging: The incentive systems that have been carefully studied have had only small effects, and in many cases no effect, on student learning.
Measuring student learning
At best, any test can measure students' knowledge of only a subset of the content in a particular subject area; it is also generally more difficult to design test items at higher levels of cognitive complexity. These limitations take on greater significance when incentives are tied to the test results. Research has shown that incentives can encourage teachers to "teach to the test" by narrowing their focus to the material most likely to appear on the test. As a result, their students' scores may be artificially inflated because the score reflects their knowledge of only part of the material the students should know about the subject.
For example, if teachers move from covering the full range of material in eighth-grade mathematics to focusing only on the portion included on the test, their students' test scores may rise even as their learning in the untested part of the subject stays the same or even declines.
In measuring how incentives affect student learning of a subject, it is important to look at students' scores not on the high-stakes test that is tied to the incentives, but at low-stakes tests that are designed to provide a general picture of the quality of learning and do not have direct consequences for schools, teachers, or students. Because there is no incentive that would motivate teachers to narrow their instruction to the materials tested on low-stakes tests, the scores on those tests, such as the National Assessment of Educational Progress (NAEP), are less likely to be inflated and can give a more reliable picture of student learning in a subject area. In conducting its review of the research, the committee focused mainly on studies that based their assessment on low-stakes tests.
The committee also limited its evaluation to studies that allowed researchers to draw causal conclusions about the effects of test-based incentives. This means that studies had to have a comparison group of students, teachers, or schools that were not subject to incentives or rewards, and that individuals or groups could not self-select into the comparison group. In addition, the committee looked only at studies of programs that had existed long enough to supply meaningful results, which means that some programs, particularly many involving performance pay for teachers, were too new to evaluate. …