The National Assessment of Educational Progress is considered the "gold standard" of assessment. However, Mr. Pellegrino warns, those who are setting proficiency standards for state tests to meet the requirements of NCLB should be wary of using NAEP's achievement levels as a guide.
CONFUSION and controversy frequently surround the process of setting performance standards for a state's high-stakes achievement assessments. This is especially problematic because performance data from such tests are used to meet the adequate yearly progress (AYP) provisions of the No Child Left Behind (NCLB) legislation. In light of these pressures, there is a natural tendency for policy makers, the press, and the public to use the National Assessment of Educational Progress (NAEP) for comparisons and guidance. There are two primary reasons why NAEP, rightly or wrongly, is often singled out in this way. First, NAEP is considered the "gold standard" of educational assessments, largely because of its history as a national indicator and the quality and care that have gone into its design and development. Second, because NAEP is seen as a high-quality indicator of academic achievement, its performance standards are perceived to have greater rigor and validity than those set for many other assessments, including the achievement tests developed by individual states.
NAEP results have been used for comparisons to recent state achievement test results and, appropriately or inappropriately, they will probably be used to evaluate standards-setting processes for those states that have developed new assessments or have expanded the number of grades assessed. Typically, such changes in a state's assessment program are needed to meet the requirement, which took effect in 2006, to test all students in grades 3-8 in both mathematics and reading.
To make this discussion a bit more concrete, consider as an example Wyoming's new state test known as PAWS (Proficiency Assessment for Wyoming Students). PAWS is quite different in purpose and design from Wyoming's prior test, known as the WyCAS. Given that performance standards for PAWS need to be established and that those standards are likely to be scrutinized carefully both within Wyoming and at the federal level, it is reasonable to ask whether NAEP provides appropriate standards for Wyoming or any other state in which new standards must be set.
To make such an appraisal, one needs to understand three things. First, setting standards is a judgment carried out by reasonable people, and it occurs in a social and political context. As such, it is influenced by multiple factors, including the nature of the assessment itself, current goals and aspirations for the educational enterprise, practical considerations, sources of comparative data such as NAEP, and immediate social consequences. The second thing that bears consideration is the history of and motivation for establishing the high performance standards that have become NAEP's trademark and that serve as the "standard for comparison." The third thing to consider is the validity of NAEP's standards--do they mean what people think they mean--and whether those standards are applicable to the interpretation of levels of performance by a state's students on its NCLB high-stakes achievement test.
UNDERSTANDING NAEP STANDARDS AS EDUCATION POLICY STATEMENTS
Although NAEP has been operational since the late 1960s, the practice of reporting NAEP results in terms of performance standards is little more than 15 years old. And in that short time, the labels Basic, Proficient, and Advanced have become part of the testing landscape, adopted not only by NAEP but also by many state testing programs, including Wyoming's WyCAS and now PAWS.
The genesis of reporting performance standards can be traced to the 1989 education summit in Charlottesville, Virginia. The summit participants, including President George H. …