Academic journal article International Journal of English Studies

Large-Scale Assessment of Language Proficiency: Theoretical and Pedagogical Reflections on the Use of Multiple-Choice Tests

Academic journal article International Journal of English Studies

Large-Scale Assessment of Language Proficiency: Theoretical and Pedagogical Reflections on the Use of Multiple-Choice Tests

Article excerpt


The new requirement placed on students in tertiary settings in Spain to demonstrate a B1 or a B2 proficiency level of English, in accordance with the Coimnon European Framework of Reference for Languages (CEFRL), has led most Spanish universities to develop a program of certification or accreditation of the required level. The first part of this paper aims to provide a rationale for the type of test that lias been developed at the Universidad Politécnica de Madrid for the accreditation of a B2 level, a multiple choice version, and to describe how it was constructed and validated. Then, in the second part of the paper, the results from its application to 924 students enrolled in different degree courses at a variety of schools and faculties at the university are analyzed based on a final test version item analysis. To conclude, some theoretical as well as practical conclusions about testing grammar that affect the teaching and learning process are drawn.

KEYWORDS: language teaching and testing, large-scale testing, grammar tests, multiple choice tasks


Las nuevas exigencias sobre niveles de competencia B1 y B2 en inglés según el Marco Común Europeo de Referencia para las Lenguas (MCERL) que se imponen sobre los estudiantes de grado y posgrado han llevado a la mayoría de las universidades españolas a desarrollar programas de acreditación o de certificación de estos niveles. La primera parte de este trabajo trata sobre las razones que fundamentan la elección de un tipo concreto de examen para la acreditación del nivel B2 de lengua inglesa en la Universidad Politécnica de Madrid. Se trata de un test de opción múltiple y en esta parte del trabajo se describe cómo fue diseñado y validado. En la segunda parte, se analizan los resultados de la aplicación del test a gran escala a un total de 924 estudiantes matriculados en varias escuelas y Facultades de la Universidad. Para terminar, se apuntan una serie de conclusiones teóricas y prácticas sobre la evaluación de la gramática y de qué modo influye en los procesos de enseñanza y aprendizaje.

PALABRAS CLAVE: enseñanza y evaluación de lenguas, evaluación a gran escala, pruebas de gramática, actividades de opción múltiple


As communicative approaches to language teaching evolved (Savignon, 1977; Widdowson, 1990), communicative approaches to language testing focused on research both in relation to communicative curriculum development as well as communicative language testing (Alderson & Hughes, 1981; Lee et al., 1985; Nunan, 1988). The notion of "directness" had important implications for the testing of communicative performance as a "direct test" claims to measure ability directly while an "indirect test" requires the test-taker to perform more artificial tasks; interviews or role-plays to assess speaking or writing an e-mail or an essay in the case of writing are examples of direct tests (Berkoff, 1985; Connor, 1991; Cooper & Odell, 1999; Hamp-Lyons, 1995). On the contrary, grammar and vocabulary tests, usually including discrete item tasks, are typically used as indirect tests of language ability.

Provided that test users can make an easy connection between test performance and future use, direct tests usually have higher face validity1 than indirect tests (Davies et al., 1999) and we agree with Nunan (1988: 117,118), that the degree to which a test appears to measure the knowledge it claims to measure should never be overestimated. Nevertheless, according to Harris (1969:21) and Oiler (1979:52), this type of validity is not crucial to determine the general validity of a test. Davies (1990:23) claims that although a test should contain face validity, this must be the first one to disregard if there is any conflict with one of the other validities. But still, failure to meet face validity can eventually lead to lack of public credibility of a test as it has much to do with general acceptance.

With the awareness that the presentation of a preliminary proposal of an indirect grammar test to assess proficiency in tertiary settings could be highly unpopular in terms of face validity, at the BAAL Conference 2011 (Argüelles et al) we emphasized three points in relation to its practicality:

* First, that the proposal was made for a specific context where a B2 level had to be demonstrated on the part of the students enrolling in a course of professional and academic English, with administrative aspects of the evaluation as the priority. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.