ABSTRACT. We propose a method for ranking short information nuggets extracted from a text corpus, using another, reliable reference corpus as a user model. We argue that the availability and usage of such additional corpora is common in a number of IR tasks, and apply the method to answering a form of definition questions. The proposed ranking method makes a substantial improvement in the performance of our system.
Categories and Subject Descriptors
H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing--linguistic processing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--information filtering, search process; H.3.4 [Information Storage and Retrieval]: Systems and Software--question-answering (fact retrieval) systems; 1.2.1 [Artificial Intelligence]: Applications and Expert Systems; 1.2.7 [Artificial Intelligence]: Natural Language Processing
Keywords: Question Answering, Information Retrieval
The area of Question Answering (QA) is at the focus of a lot of research interest lately, both in the Information Retrieval (IR) community and among Computational Linguists. It is seen as one of the few applications to successfully combine techniques from Natural Language Processing and IR. The QA track at the annual Text REtrieval Conferences (TREC, ) has become an important factor in shaping and giving direction to QA research. Introduced in 1999, this track attracts a significant number of participants each year, and provides a focal point for much modern QA research. When the QA track at TREC was introduced, it focused on so called "factoid" questions (typically having a short named entity as an answer) such as How many people live in Tokyo? or When is the Tulip Festival in Michigan?. As the track evolved, it was argued that this type of questions does not accurately model the needs of real users of QA technology. In addition to named entities as answers, users often search for definitions of concepts, or for summaries of important information about them. As a result, in 2003 TREC introduced definition questions--questions for which the answer is not a single named entity, but a list of information nuggets . In the TREC 2004 QA track this was taken a step further. The questions were now clustered in small groups, organized around the same topic. For example, the topic Concorde included questions such as How many seats are in the cabin of a Concorde? and What airlines have Concordes in their fleets?. Finally, for every topic, the track guidelines required participants to supply "additional important information found in the corpus about the target, that was not explicitly asked." This last requirement has been dubbed "other" questions . In our view, the task presented at the TREC 2004 QA track, and the introduction of the "other' questions makes a big step towards more realistic user scenarios. According to our own analysis of web query logs, users tend to ask much more "knowledge gathering" questions than factoid questions about specific facts. (1)
This new type of "other" questions puts more emphasis on the user aspect in the QA process--an issue that has mostly been neglected in the QA community. The TREC criteria for what is a good answer to a given question has so far been rather vague, but QA systems dealt with this vagueness fairly effectively for factoid questions. With the "other" questions, where systems are required to return only important information, there is an implicitly assumed user model that can discriminate between important and unimportant facts about a topic. For example, for the topic Clinton, his birthday might be considered important, while the day of the week when he left Mexico probably is not. In order to give reasonable responses to "other" questions, a QA system needs to model such preferences.
We present an approach for answering "other" questions using an explicit user model. We describe a method for gathering important facts about an entity from a collection of documents and for ranking the facts with respect to their importance for the user. We show that our ranking improves over plain retrieval of facts from the corpus. The core idea of our method is to estimate the importance of facts found in the target collection by using external "reference" corpora, high-quality sources of information that model a user's ability to distinguish between important and unimportant facts. The proposed method is our first step towards user-oriented QA, and further refinements of the underlying techniques are needed. We identify additional areas where this method is or may be helpful, and discuss its strengths, weaknesses and directions for further research. The rest of the paper is organized as follows. In Section 2 we survey related work regarding answering definition questions, and about using high-quality external sources. Next, in Section 3 we describe the details of the re-ranking method. Our experiments and results follow in Section 4, and we wrap up with conclusions in Section 5.
2. RELATED WORK