Comparing English Worldwide: The International Corpus of English

By Sidney Greenbaum | Go to book overview
Save to active project

fields in the header. These include biographical details, text category, source title, and date. Appendix 4 shows the Text Info screen corresponding to the file header in Appendix 3. Both the bibliographical and the biographical information windows can be scrolled to reveal the full set of fields.

All the markup symbols can be retrieved from the corpus using ICECUP. They can be retrieved individually or in any combination. Students of dialogue can retrieve overlapping segments and all of the nonfluencies which have been discussed. Those interested in written discourse may wish to study how paragraphs begin and end, for example, or the language of newspaper headlines. These searches can be carried out by specifying the appropriate markup symbol from a complete list provided in ICECUP. File header information is used in creating subcorpora, that is, in isolating parts of the corpus for analysis. Researchers can restrict their analysis to a particular national corpus or to part of a national corpus. In addition, they can create a subcorpus which cuts across national corpora, for example, a subcorpus of scripted monologues in British and American English. The two types of markup can also be combined with each other in more sophisticated searches. For example, a researcher interested in overlapping speech in conversation might wish to see if the relationship between the speakers has any significant effect on the amount or type of overlapping which they produce. To retrieve the relevant data for this, a subcorpus of conversations must first be created. The user can then create two further subcorpora derived from this: one in which the speakers are equals, and one in which they are disparates. Finally, the markup symbols for overlapping speech can be retrieved separately from each of these subcorpora.

Many texts in the corpus are composite, that is, they comprise separate samples combined to create a single 2,000-word text. These shorter samples are referred to as subtexts. The text units are numbered in a continuous sequence throughout the text, whether it is composite or not. A second number indicates the number of the subtext, for example: 〈#65:2〉. This is text unit 65, and it occurs in the second subtext. By convention, all texts have at least one subtext, so the subtext number is always at least 1. In spoken texts, the text unit numbers include additionally the speaker identification, e.g. 〈#12:3:A〉. If the text unit occurs within extra-corpus material (see Sect. 1.2), then the text unit number has the form 〈#X:34:1〉.
To improve the readability of some citations, I have omitted markup symbols which are not relevant to the type under discussion.


NELSON G. ( 1991a), 'Manual. for Spoken Texts' ( London: Survey of English Usage, University College London).


Notes for this page

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
Loading One moment ...
Project items
Cite this page

Cited page

Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

Cited page

Bookmark this page
Comparing English Worldwide: The International Corpus of English


Text size Smaller Larger
Search within

Search within this book

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

While we understand printed pages are helpful to our users, this limitation is necessary to help protect our publishers' copyrighted material and prevent its unlawful distribution. We are sorry for any inconvenience.
Full screen
/ 290

matching results for page

Cited passage

Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

Cited passage

Welcome to the new Questia Reader

The Questia Reader has been updated to provide you with an even better online reading experience.  It is now 100% Responsive, which means you can read our books and articles on any sized device you wish.  All of your favorite tools like notes, highlights, and citations are still here, but the way you select text has been updated to be easier to use, especially on touchscreen devices.  Here's how:

1. Click or tap the first word you want to select.
2. Click or tap the last word you want to select.

OK, got it!

Thanks for trying Questia!

Please continue trying out our research tools, but please note, full functionality is available only to our active members.

Your work will be lost once you leave this Web page.

For full access in an ad-free environment, sign up now for a FREE, 1-day trial.

Already a member? Log in now.

Are you sure you want to delete this highlight?