Comparing English Worldwide: The International Corpus of English

By Sidney Greenbaum | Go to book overview

4
Markup Systems

GERALD NELSON


INTRODUCTION

Markup is the first level of annotation applied to the component corpora in ICE. It may be divided into two distinct types: textual markup, which is added to the texts themselves, and bibliographical and biographical markup, which is stored externally in the form of a file header for each text. The system for textual markup is based on a proposal by Rosta ( 1990) and is fully described in two manuals, one each for spoken and written texts ( Nelson, 1991a, 1991b). The system for encoding bibliographical and biographical information is described in Nelson ( 1991c). In this paper I will discuss both markup types in turn, giving examples from the British ICE corpus (ICE-GB). Finally, I will discuss some of the ways in which markup is used in text retrieval.


1. TEXTUAL MARKUP

Textual markup encodes features of the original text that are lost when it is converted into a computerized text file. The texts are stored as plain ASCII files, so in written texts, for example, typographic features such as boldface, italics, and underlining are lost during computerization. In spoken texts, the transcription must be marked up to indicate such features as pauses, speaker turns, and overlapping segments. These textual features are encoded by adding markup symbols to the text. All markup symbols are enclosed within angled brackets. In most cases they appear in pairs, with an opening symbol 〈symbol〉 and a closing symbol 〈/symbol〉. For example, if the word 'every' appears in boldface in the original printed text, then it will appear as 〈boldevery〈/bold〉 in the corpus. Similarly headings are enclosed within 〈h〉 and 〈lh〉, while paragraphs are enclosed within 〈p〉 and 〈/p〉. The markup symbols are inserted manually, but the process is partially automated by the Markup Assistant program. This is a set of WordPerfect macros which assigns whole markup symbols to single keys. When the markup has been applied, the CHECKMUP program in ICECUBE checks that all the symbols are valid and that every opening symbol has a corresponding closing one. A complete list of the ICE markup

-36-

Notes for this page

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this book

This book has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this book

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this page

Cited page

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited page

Bookmark this page
Comparing English Worldwide: The International Corpus of English
Table of contents

Table of contents

  • Title Page iii
  • Preface vii
  • Contents ix
  • List of Contributors xi
  • List of Figures xiii
  • List of Tables xv
  • Abbreviations xvi
  • Part I Introduction 1
  • 1: Introducing ICe 3
  • References 12
  • 2: Learner English Around the World 13
  • References 23
  • Part II Compilation and Annotation 25
  • 3: The Design of the Corpus 27
  • References 35
  • 4: Markup Systems 36
  • Notes 45
  • References 45
  • 5: The Umb Intelligent ICe Markup Assistant 54
  • References 64
  • 6: ICe Annotation Tools 65
  • 7: Developing the ICe Corpus Utility Program 79
  • 8: About the ICe Tagset 92
  • 9: Autasys: Grammatical Tagging and Cross-Tagset Mapping 110
  • 10: An Outline of the Survey's ICe Parsing Scheme 125
  • Reference 139
  • 11: The Survey Parser: Design and Development 142
  • References 157
  • Part III Problems of Implementation 161
  • 12: The New Zealand Spoken Component of ICe: Some Methodological Challenges1 163
  • References 177
  • 13: Second-Language Corpora1 182
  • References 195
  • 14: The International Corpus of English in Hong Kong 197
  • References 213
  • Part IV Applications 215
  • 15: The Corpus as A Research Domain 217
  • 16: ICe and Teaching 227
  • 17: The Sociolinguistics of English in Nigeria and the ICe Project 239
  • 18: Why A Fiji Corpus? 249
  • References 260
  • 19: Prosice: A Spoken English Database for Prosody Research 262
  • References 278
  • Index 281
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this book

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen
/ 290

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.