Academic journal article Education Next

Sizing Up Test Scores. (Forum)

Academic journal article Education Next

Sizing Up Test Scores. (Forum)

Article excerpt

ONE OF THE BASIC CRITIQUES OF USING TEST SCORES for accountability purposes has always been that simple averages, except in rare circumstances, don't tell us much about the quality of a given school or teacher. The high scores of students in a wealthy suburban New Jersey school will reflect the contributions of well-educated parents, a communal emphasis on academic achievement, a stable learning environment at home, and enriching extracurricular opportunities. Likewise, the low scores of students in an inner-city Newark school will reflect the disadvantages of growing up poor. The urban school might have stronger leadership and a more dedicated teaching staff, yet still score substantially lower than the suburban school. As a result, in the past decade researchers have grown interested in ways of measuring and comparing the gains in academic achievement that a school or teacher elicits--in other words, a school or teacher's "value added." Say, for instance, that a school lifts its students from the 35th perce ntile on national tests to the 50th percentile. An accountability system that uses value-added assessment might judge this school more effective than a school whose students consistently score at the 60th percentile. A value-added system might also identify a school's best and worst teachers by tracking their students' gains in the course of a year. The prospect of measuring the contribution made by schools and teachers to their students' progress is winning a growing number of converts to value-added assessment, However, some practical complications stand in the way.

It is important, first, to distinguish between assessment for diagnostic purposes and assessment as a mechanism of accountability. Value-added assessment has demonstrated its value in the former capacity. Pioneering work in Dallas and in Tennessee has shown that value-added assessment provides information that can be useful when viewed in context by educators who understand local circumstances.

The more serious difficulties arise when value-added assessments are used to hold schools and teachers accountable, with high-stakes personnel decisions to follow. The danger is that such assessments will be used to supplant local decisionmaking, rather than to inform it. Unfortunately, our instruments of assessment are not precise or dependable enough for this purpose. I will discuss three problems: 1) current methods of testing don't measure gains very accurately; 2) some of the gains may be attributable to factors other than the quality of a given school or teacher; and 3) we lack a firm basis for comparing gains of students of different levels of ability.

* Measured gains are noisy and unstable.

Tests are not perfect measures of student ability or achievement. A student's performance on any given test will be due partly to true ability and partly to random influences (distractions during the test, the student's emotional state that day, a fortuitous selection of test items, and so on). This test "error" causes problems enough when we attempt to assess a student's level of achievement. The problems are significantly compounded when we take it a step further, to measuring achievement gains. A gain score is the difference between two test scores, each of which is subject to measurement error. The measurement errors on the two tests, taken months apart from each other, are unlikely to be related (after all, these are random influences). When we subtract one score from another, the measurement errors do not cancel out. However, a student's true ability does not change that much from one test occasion to another. When we subtract one score from another, a good deal of the portion of the scores that represe nts true ability will cancel our. The result: the proportion of a gain score that represents measurement error is magnified vis-a-vis the initial scores. In statistical parlance, gain scores are much noisier than level scores. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.