Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing

By Ng, Hwee Tou; Zelle, John | AI Magazine, Winter 1997 | Go to article overview

Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing


Ng, Hwee Tou, Zelle, John, AI Magazine


Getting computer systems to understand natural language input is a tremendously difficult problem and remains a largely unsolved goal of Al. In recent years, there has been a flurry of research into empirical, corpus-based learning approaches to natural language processing (NLP). Whereas traditional NLP has focused on developing hand-coded rules and algorithms to process natural language input, corpus-based approaches use automated learning techniques over corpora of natural language examples in an attempt to automatically induce suitable language-processing models. Traditional work in natural language systems breaks the process of understanding into broad areas of syntactic processing, semantic interpretation, and discourse pragmatics. Most empirical NLP work to date has focused on using statistical or other learning techniques to automate relatively low-level language processing such as part-of-speech tagging, segmenting text, and syntactic parsing. The success of these approaches, following on the heels of the success of similar techniques in speech-recognition research, has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis--uncovering the meaning of an utterance.

In the area of semantic interpretation, there have been a number of interesting uses of corpus-based techniques. Some researchers have used empirical techniques to address a difficult subtask of semantic interpretation, that of developing accurate rules to select the proper meaning, or sense, of a semantically ambiguous word. These rules can then be incorporated as part of a larger system performing semantic analysis. Other research has considered whether, at least for limited domains, virtually the entire process of semantic interpretation might yield to an empirical approach, producing a sort of semantic parser that generates appropriate machine-oriented meaning representations from natural language input. This article is an introduction to some of the emerging research in the application of corpus-based, learning techniques to problems in semantic interpretation.

Word-Sense Disambiguation

The task of word-sense disambiguation (WSD) is to identify the correct meaning, or sense, of a word in context. The input to a WSD program consists of real-world natural language sentences. Typically, a separate phase prior to WSD to identify the correct part of speech of the words in the sentence is assumed (that is, whether a word is a noun, verb, and so on). In the output, each word occurrence w is tagged with its correct sense, in the form of a sense number i, where i corresponds to the i-th sense definition of w in its assigned part of speech. The sense definitions are those specified in some dictionary. For example, consider the following sentence: In the interest of stimulating the economy, the government lowered the interest rate.

Suppose a separate part-of-speech tagger has determined that the two occurrences of interest in the sentence are nouns. The various sense definitions of the noun interest, as given in the Longman Dictionary of Contemporary English (LDOCE) (Bruce and Wiebe 1994; Procter 1978), are listed in table 1. In this sentence, the first occurrence of the noun interest is in sense 4, but the second occurrence is in sense 6. Another wide-coverage dictionary commonly used in WSD research is WORDNET (Miller 1990), which is a public-domain dictionary containing about 95,000 English word forms, with a rather refined sense distinction for words.

WSD is a long-standing problem in NLP. To achieve any semblance of understanding natural language, it is crucial to figure out what each individual word in a sentence means. Words in natural language are known to be highly ambiguous, which is especially true for the frequently occurring words of a language. For example, in the WORDNET dictionary, the average number of senses for each noun for the most frequent 121 nouns in English is 7. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.