Magazine article AI Magazine

Statistical Techniques for Natural Language Parsing

Magazine article AI Magazine

Statistical Techniques for Natural Language Parsing

Article excerpt

Syntactic parsing is the process of assigning a phrase marker to a sentence, that is, the process that given a sentence such as "the dog ate" produces a structure like that in figure 1. In this example, I adopt the standard abbreviations: s for sentence, np for noun phrase, vp for verb phrase, and det for determiner.

[Figure 1. ILLUSTRATION OMITTED]

It is generally accepted that finding the sort of structure shown in figure 1 is useful in determining the meaning of a sentence. Consider a sentence such as "salespeople sold the dog biscuits." Figure 2 shows two structures for this sentence. Note that the two have different meanings: On the left, the salespeople are selling dog biscuits, but on the right, they are selling biscuits to dogs. Thus, finding the correct parse corresponds to determining the correct meaning.

[Figure 2. ILLUSTRATION OMITTED]

Figure 2 also exemplifies a major problem in parsing, syntactic ambiguity--sentences with two or more parses. In such cases, it is necessary for the parser (or the understanding system in which the parser is embedded) to choose the correct one among the possible parses.

However, this example is misleading in a fundamental respect: It implies that we can assign at least a semiplausible meaning to all the possible parses. For most grammars (certainly for the ones statistical parsers typically deal with), this is not the case. Such grammars would assign dozens, Possibly hundreds, of parses to this sentence, ranging from the reasonable to the uninterpretable, with the majority at the uninterpretable end of things. To take but one example, a grammar I have been using has the rule

np [right arrow] np np .

This rule would be used in the analysis of a noun phrase such as "10 dollars a share," where the two nps 10 dollars and a share are part of the same np. The point here is that this rule would allow the third parse of the sentence shown in figure 3, and this parse has no obvious meaning associated with it--the best I can do is an interpretation in which biscuits is the name of the dog. In fact, most of the parses that wide-coverage grammars find are like this one--pretty senseless.

[Figure 3. ILLUSTRATION OMITTED]

A usually unstated, but widely accepted, assumption in the nonstatistical community has it that some comparatively small set of parses for a sentence are legitimate ambiguities and that these parses have interpretations associated with them, albeit pretty silly ones sometimes. Furthermore, it is assumed that deciding between the legitimate parses is the responsibility not of the parser but, rather, of some syntactic disambiguation unit working either in parallel with the parser or as a postparsing process. Thus, our hypothetical nonstatistical traditionalist might say that the parser must rule out the structure in figure 3 but would be within its rights to remain undecided between those in figure 2.

[Figure 2 ILLUSTRATION OMITTED]

By contrast, statistical parsing researchers assume that there is a continuum and that the only distinction to be drawn is between the correct parse and all the rest. The fact that we were able to find some interpretation for the parse in figure 3 supports this continuum view. To put it another way, in this view of the problem, there is no difference between parsing on the one hand and syntactic disambiguation on the other: it's parsing all the way down.

Part-of-Speech Tagging

The view of disambiguation as inseparable from parsing is well illustrated by the first natural language-processing task to receive a thoroughgoing statistical treatment--part-of-speech tagging (henceforth, just tagging). A tagger assigns to each word in a sentence the part of speech that it assumes in the sentence. Consider the following example:

The     can             will            rust
det     modal-verb      modal-verb      noun
        noun            noun            verb
        verb            verb

Under each word, I give some of its possible parts of speech in order of frequency; the correct tag appears in bold. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.