fields in the header. These include biographical details, text category, source title, and date. Appendix 4 shows the Text Info screen corresponding to the file header in Appendix 3. Both the bibliographical and the biographical information windows can be scrolled to reveal the full set of fields.
All the markup symbols can be retrieved from the corpus using ICECUP. They can be retrieved individually or in any combination. Students of dialogue can retrieve overlapping segments and all of the nonfluencies which have been discussed. Those interested in written discourse may wish to study how paragraphs begin and end, for example, or the language of newspaper headlines. These searches can be carried out by specifying the appropriate markup symbol from a complete list provided in ICECUP. File header information is used in creating subcorpora, that is, in isolating parts of the corpus for analysis. Researchers can restrict their analysis to a particular national corpus or to part of a national corpus. In addition, they can create a subcorpus which cuts across national corpora, for example, a subcorpus of scripted monologues in British and American English. The two types of markup can also be combined with each other in more sophisticated searches. For example, a researcher interested in overlapping speech in conversation might wish to see if the relationship between the speakers has any significant effect on the amount or type of overlapping which they produce. To retrieve the relevant data for this, a subcorpus of conversations must first be created. The user can then create two further subcorpora derived from this: one in which the speakers are equals, and one in which they are disparates. Finally, the markup symbols for overlapping speech can be retrieved separately from each of these subcorpora.
NELSON G. ( 1991a), 'Manual. for Spoken Texts' ( London: Survey of English Usage, University College London).