Statistical Techniques for Natural Language Parsing

By Charniak, Eugene | AI Magazine, Winter 1997 | Go to article overview

Statistical Techniques for Natural Language Parsing


Charniak, Eugene, AI Magazine


Syntactic parsing is the process of assigning a phrase marker to a sentence, that is, the process that given a sentence such as "the dog ate" produces a structure like that in figure 1. In this example, I adopt the standard abbreviations: s for sentence, np for noun phrase, vp for verb phrase, and det for determiner.

[Figure 1. ILLUSTRATION OMITTED]

It is generally accepted that finding the sort of structure shown in figure 1 is useful in determining the meaning of a sentence. Consider a sentence such as "salespeople sold the dog biscuits." Figure 2 shows two structures for this sentence. Note that the two have different meanings: On the left, the salespeople are selling dog biscuits, but on the right, they are selling biscuits to dogs. Thus, finding the correct parse corresponds to determining the correct meaning.

[Figure 2. ILLUSTRATION OMITTED]

Figure 2 also exemplifies a major problem in parsing, syntactic ambiguity--sentences with two or more parses. In such cases, it is necessary for the parser (or the understanding system in which the parser is embedded) to choose the correct one among the possible parses.

However, this example is misleading in a fundamental respect: It implies that we can assign at least a semiplausible meaning to all the possible parses. For most grammars (certainly for the ones statistical parsers typically deal with), this is not the case. Such grammars would assign dozens, Possibly hundreds, of parses to this sentence, ranging from the reasonable to the uninterpretable, with the majority at the uninterpretable end of things. To take but one example, a grammar I have been using has the rule

np [right arrow] np np .

This rule would be used in the analysis of a noun phrase such as "10 dollars a share," where the two nps 10 dollars and a share are part of the same np. The point here is that this rule would allow the third parse of the sentence shown in figure 3, and this parse has no obvious meaning associated with it--the best I can do is an interpretation in which biscuits is the name of the dog. In fact, most of the parses that wide-coverage grammars find are like this one--pretty senseless.

[Figure 3. ILLUSTRATION OMITTED]

A usually unstated, but widely accepted, assumption in the nonstatistical community has it that some comparatively small set of parses for a sentence are legitimate ambiguities and that these parses have interpretations associated with them, albeit pretty silly ones sometimes. Furthermore, it is assumed that deciding between the legitimate parses is the responsibility not of the parser but, rather, of some syntactic disambiguation unit working either in parallel with the parser or as a postparsing process. Thus, our hypothetical nonstatistical traditionalist might say that the parser must rule out the structure in figure 3 but would be within its rights to remain undecided between those in figure 2.

[Figure 2 ILLUSTRATION OMITTED]

By contrast, statistical parsing researchers assume that there is a continuum and that the only distinction to be drawn is between the correct parse and all the rest. The fact that we were able to find some interpretation for the parse in figure 3 supports this continuum view. To put it another way, in this view of the problem, there is no difference between parsing on the one hand and syntactic disambiguation on the other: it's parsing all the way down.

Part-of-Speech Tagging

The view of disambiguation as inseparable from parsing is well illustrated by the first natural language-processing task to receive a thoroughgoing statistical treatment--part-of-speech tagging (henceforth, just tagging). A tagger assigns to each word in a sentence the part of speech that it assumes in the sentence. Consider the following example:

The     can             will            rust
det     modal-verb      modal-verb      noun
        noun            noun            verb
        verb            verb

Under each word, I give some of its possible parts of speech in order of frequency; the correct tag appears in bold. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA 8, MLA 7, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Note: primary sources have slightly different requirements for citation. Please see these guidelines for more information.

Cited article

Statistical Techniques for Natural Language Parsing
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen
Items saved from this article
  • Highlights & Notes
  • Citations
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA 8, MLA 7, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Search by... Author
    Show... All Results Primary Sources Peer-reviewed

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.