An Overview of Empirical Natural Language Processing

By Brill, Eric; Mooney, Raymond J | AI Magazine, Winter 1997 | Go to article overview

An Overview of Empirical Natural Language Processing


Brill, Eric, Mooney, Raymond J, AI Magazine


In recent years, there has been a resurgence in research on empirical methods in natural language processing. These methods employ learning techniques to automatically extract linguistic knowledge from natural language corpora rather than require the system developer to manually encode the requisite knowledge. The current special issue reviews recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction, and machine translation. This article presents an introduction to the series of specialized articles on these topics and attempts to describe and explain the growing interest in using learning methods to aid the development of natural language processing systems.

One of the biggest challenges in natural language processing is how to provide a computer with the linguistic sophistication necessary for it to successfully perform language-based tasks. This special issue presents a machine-learning solution to the linguistic knowledge-acquisition problem: Rather than have a person explicitly provide the computer with information about a language, the computer teaches itself from online text resources.

A Brief History of Natural Language Research

Since its inception, one of the primary goals of AI has been the development of computational methods for natural language understanding. Early research in machine translation illustrated the difficulties of this task with sample problems such as translating the word pen appropriately in "The box is in the pen" versus "The pen is in the box" (Bar-Hillel 1964). It was quickly discovered that understanding language required not only lexical and grammatical information but semantic, pragmatic, and general world knowledge. Nevertheless, during the 1970s, AI systems were developed that demonstrated interesting aspects of language understanding in restricted domains such as the blocks world (Winograd 1972) or answers to questions about a database of information on moon rocks (Woods 1977) or airplane maintenance (Waltz 1978). During the 1980s, there was continuing progress on developing natural language systems using hand-coded symbolic grammars and knowledge bases (Allen 1987). However, developing these systems remained difficult, requiring a great deal of domain-specific knowledge engineering. In addition, the systems were brittle and could not function adequately outside the restricted tasks for which they were designed. Partially in reaction to these problems, in recent years, there has been a paradigm shift in natural language research. The focus has shifted from rationalist methods based on hand-coded rules derived to a large extent through introspection to empirical, or corpus-based, methods in which development is much more data driven and is at least partially automated by using statistical or machine-learning methods to train systems on large amounts of real language data. These two approaches are characterized in figures 1 and 2.

Empirical and statistical analyses of natural language were previously popular in the 1950s when behaviorism was thriving in psychology (Skinner 1957), and information theory was newly introduced in electrical engineering (Shannon 1951). Within linguistics, researchers studied methods for automatically learning lexical and syntactic information from corpora, the goal being to derive an algorithmic and unbiased methodology for deducing the structure of a language. The main insight was to use distributional information, such as the environment a word can appear in, as the tool for language study. By clustering words and phrases based on the similarity of their distributional behavior, a great deal could be learned about a language (for example, Kiss [1973], Stolz [1965], Harris [1962], Chatman [1955], Harris [1951], and Wells [1947]). Although the goal of this research was primarily to gain insight into the structure of different languages, this framework parallels that of modern empirical natural language processing: Given a collection of naturally occurring sentences as input, algorithmically acquire useful linguistic information about the language. …

The rest of this article is only available to active members of Questia

Sign up now for a free, 1-day trial and receive full access to:

  • Questia's entire collection
  • Automatic bibliography creation
  • More helpful research tools like notes, citations, and highlights
  • Ad-free environment

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

An Overview of Empirical Natural Language Processing
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Full screen

matching results for page

Cited passage

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited passage

Thanks for trying Questia!

Please continue trying out our research tools, but please note, full functionality is available only to our active members.

Your work will be lost once you leave this Web page.

For full access in an ad-free environment, sign up now for a FREE, 1-day trial.

Already a member? Log in now.