Second-Language Corpora 1
Although the number of corpus-linguistic publications has increased dramatically in the last few years (e.g. Svartvik, 1992), the special problems and the potential of second-language corpora have not been given adequate systematic treatment. In sociolinguistic textbooks or in overviews of English as a world language the distinction between English as a Native Language (ENL), English as a Second Language (ESL), English as an International Language (EIL), and English as a Foreign Language (EFL) has been widely recognized as one of the special assets of English (cf. Asher, 1994 s.v.), but the implications of this have not penetrated to the heart of corpus linguistics.
Corpus linguistics began in the ENL context. The Survey of English Usage Corpus, the Lancaster--Oslo/ Bergen (LOB) Corpus, and the Brown Corpus were the pioneering models of the first generation, combining a strong empirical data-based approach with a sensitivity for systematic (socio-)stylistic contextualism and variationism. In the EFL context (and its modern expansion the EIL context) the data- based approach has been used since the heyday of error and contrastive analyses in the 1960s. The International Corpus of Learner English (ICLE, cf. Ch. 2 this volume) is a logical continuation of these approaches, providing a more systematic account than individual collections by schoolteachers or publishers. ENL and EFL corpora are linked rather closely: in the development of ENL corpora the direct applications in target-language grammars (such as the Oxford English Grammar, 1996) and dictionaries (such as the COBUILD Dictionary, 1987) have played a driving role, as EFL texts are measured against these models.
The aim of this paper is to show that the expansion of corpus-linguistic work into ESL contexts can raise new challenges and provide new opportunities for corpus linguistics. Data collection on ESL varieties has previously been limited to salient features, 'deviations' from (near-)native-speaker intuition. But long lists of anecdotal evidence must remain unsatisfactory linguistically, as they leave open questions about the consistency, systematicity, and interrelatedness of linguistic features.
A major feature of the ICE philosophy is that it embraces ESL countries systematically in addition to ENL countries, to which many modern standard descriptions