Academic journal article International Journal of English Studies

Limited Aspects of Reality: Frames of Reference in Language Assessment

Language testers operate within two frames of reference: norm-referenced (NRT) and criterion-referenced testing (CRT). The former underpins the world of large-scale standardized testing that prioritizes variability and comparison. The latter supports substantive score meaning in formative and domain specific assessment. Some claim that the criterion-referenced enterprise is dead, save its legacy in score reporting (Davidson, 2012, p. 198). We argue that announcing the demise of CRT is premature. But we do acknowledge that what now passes as CRT is in fact not criterion-referenced, but is based upon a corruption of the original meaning of "criterion" as domain-specific performance. This distortion took place when NRT co-opted the term "standard" to serve as a rationale for the measurement enterprise of establishing cut-scores to retrofit NR tests with meaning derived from external scales. The true heirs of the CRT movement are researchers who base test design in the careful analysis of construct and content in domain specific communication.

KEYWORDS: language testing, criterion-referencing, norm-referencing, domain description, specific purpose testing, scoring criteria, standard setting


Quienes evalúan el aprendizaje de lenguas operan con dos marcos de referencia: la evaluación basada en la norma (EN) y la evaluación basada en criterios (EC). La primera subyace a la evaluación estandarizada, que prioriza la variabilidad y la comparación, mientras que la segunda fundamenta el significado de los resultados de la evaluación formativa en ámbitos específicos. Hay quienes afirman que la evaluación basada en criterios lia llegado a su fin, dejando como único legado el modo en que se comunican sus resultados (Davidson, 2012: 198). En este artículo defendemos que anunciar la defunción de la EC es prematuro. Sí admitimos, sin embargo, que lo que actualmente se considera EC de hecho no lo es, sino que parte de una corrupción del significado original de "criterio" como actuación relativa a un determinado ámbito. Esta distorsión tuvo lugar cuando la EN se apropió del término "estándar" como sustento teórico para el establecimiento de notas de corte en la actualización de exámenes basados en la norma cuyo significado se extrae de escalas de evaluación externas. Los verdaderos herederos de la EC son los investigadores que basan el diseño de exámenes en un escrupuloso análisis del constructo y de los contenidos de la comunicación específica de cada ámbito.

PALABRAS CLAVE: evaluación de lenguas, evaluación basada en criterios, evaluación basada en la norma, evaluación con fines específicos, criterios de evaluación, definición de estándares


The term "criterion-referenced testing" was first used by Glaser in a series of publications in the early 1960s. The term was used to distinguish a newly conceptualised frame of reference from the existing "norm-referenced testing" (Glaser, 1963). As Glaser (1994a: 9) later put it, ".. .there was a need for development of proficiency instruments which assessed performance, not in terms of how an individual compared with other individuals, but with respect to how adequately he or she had attained the level of competence required for system operation." The machinery of norm-referenced testing had been evolving since the mid-19th Century, which Edgeworth (1888:626) correctly described as "a species of sortition". Tests were the tools society had designed to rank order individuals for the purpose of decision making, usually for employment or certification. The technology that made test use possible was the curve of normal distribution. The ability of test designers to create items with maximum variance and high discrimination spread the test-taking population out in such a way that the position of any individual could be compared with the proportion of the population gaining a lower or higher score (Fulcher, 2010: 35-42).

Norm-referenced tests and the interpretation of NRT scores are premised on the twin concepts of sortition and comparison, through procedures that establish the relative position of each member of the population. …

