Lexical Behaviour in Academic and Technical Corpora: Implications for ESP Development

Lexical Behaviour in Academic and Technical Corpora: Implications for ESP Development

Lexical approaches to Academic and Technical English have been well documented by scholars from as early as Cowie (1978). More recent work demonstrates how computer technology can assist in the effective analysis of corpus-based data (Cowie, 1998; Pedersen, 1995; Scott, 2000). For teaching purposes, this recent research has shown that the distinction between common coreness and diversity is a crucial issue. This paper outlines a way of dealing with vocabulary in English for Academic Purposes (EAP) instruction in the light of insights provided by empirical observation. Focusing mainly on collocation in the context of English for Specific Purposes (ESP), and, more precisely, within English for Information Science and Technology, we show how the results of the contrastive study of lexical items in small specific corpora can become the basis for teaching / learning ESP at the tertiary level. In the process of this study, an account is given of the functions of academic and technical lexis, aspects of keywords and word frequency are defined, and the value of corpus-derived collocation information is demonstrated for the specific textual environment.


The areas of English for Specific Purposes (ESP) and corpus-based lexical studies seem to converge in the study of terminology (cf. Pedersen, 1995). The main aim in terminology studies is to create specialised dictionaries that reflect knowledge fields and concepts where these are related to the property of lexical use restriction.[1] In the textual collections, collocations play an essential role in the description of this specific language usage (Pedersen, 1995, p. 61). In this sense, word combinations work as building blocks that increase the learner's potential to command special languages.

However, the results of technical collocation studies have little to offer students for academic performance and achievement: that is, they do not help learners meet the "stylistic expectations of the academic community" (Cowie, 1998, p. 12). This is because of the fact that in addition to the specialised terminology, there are other types of combinations that greatly influence the ESP learning context: for example, seek the objective, consider my suggestion, the theory is canvassed, argue rather less vehemently, and many other examples of academic discourse (Cowie, 1978, p. 132).

Our approach is precisely based on the distinction between technical and academic word behaviour. We are influenced by lexicography where this this double perspective is exploited (e.g., Lozano Palacios, 1999) according to whom general academic vocabulary is distinguished from more specific word use.

Lexical levels or categories are fostered and described through the application of corpus-based studies. The design of a fit corpus is of prime importance so that lexical profiles can be developed effectively. This means that aspects such as size, type, balance, and integration of texts must be defined from the scope of ESP. In this line of work, small representative corpora are favoured for specific purposes (Tribble, 1997, p. 116).

In addition, an electronic concordancer such as WordSmith Tools (Scott, 1996) is rather useful to handle reduced text collections (Tribble, 2000). This includes dealing with differences between one given genre and the reference corpus, or between one specific theme and the overall body of subject texts (Scott, 1997). The results obtained are Keywords, which signal the "aboutness" of the texts (Scott, 2000), and thus receive primary observation in restricted language measurement. General word usage, in contrast, is derived from lexical surveys across subject boundaries. These are examined through critical concordance data, also known as KWIC -- Key Word In Context.

With these notions in mind, particular subject areas are represented by specific corpora. The size and type of the sources can vary, depending on how similar or different the topics are. …

