The time required to compile the corpora was determined not only by copyright problems, but also by the number of different samples and the range of sources required. In comparison with other corpora, the individual components of ICE, with one million words each, are relatively small. However, one million words could be compiled very quickly if they were taken from a small number of sources. In contrast, the ICE corpora sample a very wide range of different sources. The corpus design dictates that we use at least 500 different texts, and because many of these are composite, the actual number of individual samples is much greater. The British corpus, for example, contains a total of 989 different samples. Though it makes the compilation stage much more time-consuming, our broad sampling procedure ensures that the corpora are representative of the English in general use in each participating country.
GREENBAUM S. ( 1991), "'The Compilation of the International Corpus of English and its Components'" ( London: Survey of English Usage, University College London).
-----( 1992), "'A New Corpus of English: ICE'", in J. SVARTVIK (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4-8 August 1991 ( Berlin: Mouton de Gruyter).
LEITNER G. ( 1992), "'International Corpus of English: Corpus Design--Problems and Suggested Solutions'", in G. LEITNER (ed.), New Directions in English Language Corpora: Methodology, Results, Software Developments, 75-96 ( Berlin: Mouton de Gruyter).
PETERS P. ( 1991), 'ICE Issues in the Collecting and Transcribing of Texts', unpublished discussion paper.
SCHMIED J. ( 1990), "'Corpus Linguistics and Non-native Varieties of English'", World Englishes, 9: 255-68.