Academic journal article Foreign Language Annals

A Rationale for Criterion-Referenced Proficiency Testing

Academic journal article Foreign Language Annals

A Rationale for Criterion-Referenced Proficiency Testing

Article excerpt


In the field of second language learning, there is growing interest in assessing not only speaking and writing proficiency but also learners' proficiency in reading and listening comprehension. As the profession begins to more rigorously assess reading and listening proficiency levels, it will be crucial that appropriate test design and test scoring procedures are applied.

A major reason for testing is to make classi fi cation decisions. Cronbach and Glaser made this point in the introduction to their now-classic 1957 textbook; yet the prevalence of classical statistical procedures based on average results, curves, and standard deviations has obscured the fact that there is a second approach to testing referred to as Criterion-Referenced testing. This second approach to making test-based decisions contrasts with the historically predominant purpose for testing-and therefore with the procedures for test design, construction, scoring, and score interpretation that are most commonly used today. It is, therefore, useful to begin by comparing these two distinct philosophical approaches to making test-based decisions and to highlight how those differences in purpose are reflected in the design, construction, scoring, and interpretation of scores for each type of test.

Norm-Referenced and Criterion-Referenced Tests

In classroom settings, the most frequently used type of test compares test takers against each other and arrays those test takers by their scores from highest to lowest or vice versa. This approach is particularly applicable when the intent is to grade the test takers on a curve, as is often done for curriculum-based, classroom tests. Because the test takers are compared against others, and their relative standing is dependent on the general, or normal, ability level of their comparison group, this type of test is called a norm-referenced (NR) test. NR tests compare students against other students, and because the purpose of NR tests is to compare people against other people, the items that are selected for those tests are primarily chosen based on their ability to distinguish between test takers of varying abilities. Items may be selected to cover specific topical areas, but if everyone answers an item correctly, that item is discarded because it does not discriminate among learners; that is, it does not separate the "best" from the "rest."

In contrast, criterion-referenced (CR) tests do not compare test takers against each other. Rather, CR tests compare test takers against a set of clearly stated expectations or criteria. To provide enough information to assess whether someone has met a specificcriterion,theexpectations, or criterion statement, must describe three elements: (1) the task to be completed; (2) the conditions or contexts in which the task is to be performed; and (3) the performance standard, or level of success or accuracy, that is required. Because test takers' scores are not compared with or referenced to others' level of mastery, a CR assessment may reveal that all students, some students, or even no students actually meet the stated expectations and thus pass or fail a CR test.

Both NR tests, designed to compare test takers against each other, and CR tests, designed to measure test takers' knowledge and skills against a predetermined performance standard, can be used in instructional settings: NR tests are more commonly used to assess course-specific learning and to assign course grades, while CR tests can be used to assess mastery of specifi clearningoutcomesaswellas curriculum-independent skills and higherorder, program-level instructional results. Because of their independence from curriculum and instruction as delivered in a particular teaching/learning context, CR tests can be used to compare the abilities of students from different classes as well as students with different learning experiences against a common set of external ability expectations (Brown & Hudson, 2002; Shrock & Coscarelli, 2007; Smith & Stone, 2009). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.