At the Intersection of Law and Psychometrics: Explaining the Validity Clause of No Child Left Behind
Superfine, Benjamin Michael, Journal of Law and Education
On January 8, 2002, President George W. Bush signed No Child Left Behind (NCLB), a law reauthorizing the Elementary and secondary Education Act (ESEA). NCLB represents one of the greatest intrusions of the federal government into education policy, an area traditionally reserved to states. Responding to the well-publicized shortcomings of education in the United States, NCLB mandates education reform on a national scale.1 In particular, NCLB requires states to develop and use standardized tests in sweeping and unprecedented ways.
Previous to NCLB, states had been using standardized tests for their own purposes. For example, many states had already instituted a regime of "high stakes" testing-a practice that hinges an individual student's ability to graduate, be promoted from one grade to another, or be placed in a particular track, on the results of a standardized test.2 In addition, states had been making decisions that involve lower stakes on the basis of test scores. Decisions of this sort are more varied than high stakes decisions and include consequences like whether to award different types of diplomas.3 States are currently in the process of passing more laws that involve high stakes and lower stakes tests.4
Although NCLB also requires testing, the consequences resulting from tests under NCLB can be quite different than the consequences from tests under state law. Instead of directly affecting individual students with high stakes or lower stakes decisions, NCLB applies pressure at the state, district, and school levels.5 By accepting federal Title I funds under NCLB, states, schools, and districts become accountable for the performance of their students. If students fail to meet performance goals on certain assessments, an array of consequences awaits the schools of those students. These consequences range from participating in a system of public school choice to having entire schools restructured or run by a for-profit company. Thus, the requirements of NCLB have instituted a complex regime of testing that has the potential to drastically change the face of schooling in the United States.
For this primary reason, we should ensure that the testing practices employed in the service of NCLB are valid. Unfortunately, what it takes to validate a testing practice is often unclear. As the use of tests has grown, the concept of validity has grown as well. Previously, the notion of validity could be captured with relatively simple quantitative measurements. However, the work of many modern theorists indicates that validity has evolved into a much more qualitative and argument-based concept. According to some of the more modern and influential psychometric theorists, validity relates not to the tests themselves, but to the interpretations and uses of test scores.6 While certain quantitative measurements do play a factor in evaluating the validity of interpretations or uses, these measurements are considered by many theorists to be only part of an extraordinarily multi-faceted approach. This approach centers on explicitly uncovering the assumptions of a testing practice and evaluating the tenability of each assumption. While this approach does not offer the desired "yes or no" decisions about whether certain testing practices are valid, this realistic and nuanced approach stems from modern developments in psychometric theory and practice.
Accordingly, the text of NCLB includes a "validity clause." This clause states that NCLB assessments must "be used for purposes for which such assessments are valid and reliable, and be consistent with relevant, nationally recognized professional and technical standards."7 While this clause is seemingly clear in its mandate to make and maintain valid testing practices, deeper inquiry reveals that much work is needed to articulate what this clause really means. The term "validity" has a long history in both psychometrics and the law. Over time, its definition has fluctuated drastically, and the legal definition has not always matched the psychometric definition. …