Comparing English Worldwide: The International Corpus of English

By Sidney Greenbaum | Go to book overview

9
AUTASYS: Grammatical Tagging and Cross-Tagset Mapping

ALEX CHENGYU FANG


1. INTRODUCTION

Ever since the advent of the first computer linguistic corpus in the 1960s, linguists and computer programmers have been working on the annotation of material thus stored. Word-class tagging, the assignment of an unambiguous indication of the grammatical word class to each word in a text, has been in great demand, not only in lexicographical and grammatical studies, but also in natural language processing (NLP), an area where the corpus-based, or more specifically, probabilistic approach is becoming increasingly popular. Taggers have flourished and the past twenty years or so have witnessed TAGGIT ( Greene and Rubin, 1971), CLAWS ( Marshall, 1983; Garsideet al., 1987), FALSUNGA ( DeRose, 1988), AGTS ( Huang, 1991), and TOSCA ( Oostdijk, 1991), to name just a few. Tagsets different in various aspects have also come into being, with Brown ( Francis, 1980), LOB ( Johansson et al., 1986), and Lund ( Svartvik, 1987) as the best known. Most recently, a tagset has been designed at the Survey of English Usage (SEU), University College London ( Greenbaum and Ni, 1994; Greenbaum, 1995), which has been used to annotate the one- million-word British component of the International Corpus of English (ICE-GB, cf. Greenbaum, 1992).

This has created an intriguing situation in corpus annotation. On the one hand, compilers of corpora vary in what they intend as the primary uses of their corpora. Grammarians, lexicographers, language teachers, and NLP researchers naturally want different information from corpus annotation: grammatical, morphological, discoursal, statistical, semantic, pragmatic, or prosodic. On the other hand, unfortunately, we have not seen any single annotation scheme that meets all these requirements. Corpora thus differently annotated according to different schemes have become 'isolated islands', rendering cross-corpora studies virtually impossible. Consequently, it is desirable that either a standard annotation scheme be agreed upon in this field, or flexible systems be designed that can readily adapt themselves to different annotation schemes.

The tagger described in this chapter, AUTASYS, was designed by Alex Chengyu

-110-

Notes for this page

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this book

This book has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this book

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this page

Cited page

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited page

Bookmark this page
Comparing English Worldwide: The International Corpus of English
Table of contents

Table of contents

  • Title Page iii
  • Preface vii
  • Contents ix
  • List of Contributors xi
  • List of Figures xiii
  • List of Tables xv
  • Abbreviations xvi
  • Part I Introduction 1
  • 1: Introducing ICe 3
  • References 12
  • 2: Learner English Around the World 13
  • References 23
  • Part II Compilation and Annotation 25
  • 3: The Design of the Corpus 27
  • References 35
  • 4: Markup Systems 36
  • Notes 45
  • References 45
  • 5: The Umb Intelligent ICe Markup Assistant 54
  • References 64
  • 6: ICe Annotation Tools 65
  • 7: Developing the ICe Corpus Utility Program 79
  • 8: About the ICe Tagset 92
  • 9: Autasys: Grammatical Tagging and Cross-Tagset Mapping 110
  • 10: An Outline of the Survey's ICe Parsing Scheme 125
  • Reference 139
  • 11: The Survey Parser: Design and Development 142
  • References 157
  • Part III Problems of Implementation 161
  • 12: The New Zealand Spoken Component of ICe: Some Methodological Challenges1 163
  • References 177
  • 13: Second-Language Corpora1 182
  • References 195
  • 14: The International Corpus of English in Hong Kong 197
  • References 213
  • Part IV Applications 215
  • 15: The Corpus as A Research Domain 217
  • 16: ICe and Teaching 227
  • 17: The Sociolinguistics of English in Nigeria and the ICe Project 239
  • 18: Why A Fiji Corpus? 249
  • References 260
  • 19: Prosice: A Spoken English Database for Prosody Research 262
  • References 278
  • Index 281
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this book

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen
/ 290

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.