Project Halo Update - Progress toward Digital Aristotle
Gunning, David, Chaudhri, Vinay K., Clark, Peter, Barker, Ken, Chaw, Shaw-Yi, Greaves, Mark, Grosof, Benjamin, Leung, Alice, McDonald, David, Mishra, Sunil, Pacheco, John, Porter, Bruce, Spaulding, Aaron, Tecuci, Dan, Tien, Jing, AI Magazine
In the winter 2004 issue of AI Magazine, we reported Vulcan Inc.'s first step toward creating a question-answering system called Digital Aristotle. The goal of that first step was to assess the state of the art in applied knowledge representation and reasoning (KRR) by asking AI experts to represent 70 pages from the advanced placement (AP) chemistry syllabus and to deliver knowledge-based systems capable of answering questions from that syllabus. This article reports the next step toward realizing a Digital Aristotle: we present the design and evaluation results for a system called AURA, which enables domain experts in physics, chemistry, and biology to author a knowledge base and that then allows a different set of users to ask novel questions against that knowledge base. These results represent a substantial advance over what we reported in 2004, both in the breadth of covered subjects and in the provision of sophisticated technologies in knowledge representation and reasoning, natural language processing, and question answering to domain experts and novice users.
Project Halo is a long-range research effort sponsored by Vulcan Inc., pursuing the vision of the "Digital Aristotle" - an application containing large volumes of scientific knowledge and capable of applying sophisticated problem-solving methods to answer novel questions. As this capability develops, the project focuses on two primary applications: a tutor capable of instructing and assessing students and a research assistant with the broad, interdisciplinary skills needed to help scientists in their work. Clearly, this goal is an ambitious, longterm vision, with Digital Aristotle serving as a distant target for steering the project's near-term research and development.
Making the full range of scientific knowledge accessible and intelligible to a user might involve anything from the simple retrieval of facts to answering a complex set of interdependent questions and providing user-appropriate justifications for those answers. Retrieval of simple facts might be achieved by information- extraction systems searching and extracting information from a large corpus of text. But, to go beyond this, to systems that are capable of generating answers and explanations that are not explicitly written in the texts, requires the computer to acquire, represent, and reason with knowledge of the domain (that is, to have genuine, internal "understanding" of the domain).
Reaching this ambitious goal requires research breakthroughs in knowledge representation and reasoning, knowledge acquisition, natural language understanding, question answering, and explanation generation. Vulcan decided to approach this ambitious effort by first developing a system capable of representing and reasoning about introductory, college-level science textbooks, specifically, a system to answer questions on advanced placement (AP) exams.1
Question answering has long challenged the AI field, and several researchers have proposed question answering against college-level textbooks as a grand challenge for AI (Feigenbaum 2003, Reddy 2003). Project Halo, described in this article, provides an essential component to meet that challenge - a tool for representing and using textbook knowledge for answering questions by reasoning.
As an initial, exploratory step toward this vision, Vulcan initiated the Halo Pilot in 2002 - a sixmonth effort to investigate the feasibility of creating a scientific knowledge base capable of answering novel questions from an AP (first-year, college-level) chemistry test. Three teams - SRI International, Cycorp, and Ontoprise - developed knowledge bases for a limited section of an AP chemistry syllabus. The knowledge bases could correctly answer between 30 and 50 percent of the associated questions from the AP test (Friedland et al. 2004a, 2004b).
While encouraging, these results had limitations. Only a small subset of knowledge, from one domain, was tested - leaving the question of how well the techniques would generalize to other material and other domains. Knowledge representation experts, rather than domain experts, had encoded the knowledge bases, making large-scale implementation impractical. Also, all test questions were translated manually from natural language into formal logic (also by knowledge representation experts), not addressing the problem of question formulation by typical users.
In 2004, Vulcan initiated Halo Phase II with the goal of developing tools to enable subject matter experts (SMEs) (such as chemists, biologists, and physicists) to formulate the knowledge and tools to enable less-experienced domain users, such as undergraduates in these disciplines, to formulate questions to query that knowledge. Again, multiple teams were awarded contracts to design and prototype knowledge-formulation and question-formulation tools suited for domain experts. The system that emerged as the best of these attempts, and the one described in the rest of this article, is the Automated User-Centered Reasoning and Acquisition System (AURA), which was developed by SRI International, the University of Texas at Austin, and the Boeing Company, with Professor Bonnie John from Carnegie Mellon University serving as consultant.
In Halo Phase II, the goal was developing a software system that enabled domain experts to construct declarative knowledge bases in three domains (physics, chemistry, and biology) that could answer AP-like questions posed in natural language. The AURA team analyzed the knowledge representation and question-answering requirements; crafted a user-centered design; implemented an initial system prototype; conducted an intermediate evaluation in 2006; developed a refined version of the AURA system; and conducted a final evaluation of the system in 2008 and 2009. This article summarizes that system and its evaluation.
AURA System Development
The concept of operation for AURA is as follows: a knowledge-formulation (KF) SME, with at least a graduate degree in the discipline of interest, undergoes 20 hours of training to enter knowledge into AURA; a different person, a question-formulation (QF) SME, with at least a high-school-level education, undergoes 4 hours of training and asks questions of the system. Knowledge entry is inherently a skill-intensive task and, therefore, requires more advanced training in the subject as well as training in using the system. The questioner is a potential user of the system, and we required less training for this position because we wanted as low a barrier as possible to system use.
We chose the domains of college-level physics, chemistry, and biology because they are fundamental hard sciences, and because they also stress different kinds of representations. The AP test was established as the evaluation criterion to assess progress. Textbooks were selected that covered the AP syllabus for physics (Giancoli 2004), chemistry (Brown et al. 2003), and biology (Campbell and Reece 2001). A subset of each AP syllabus was selected that covered roughly 60 pages of text and 15-20 percent of the AP topics for each domain. The AURA team was challenged to design and develop a system that could fulfill the concept of operations for the selected AP material.
Overall Design and Requirements Analyses
The initial design requirements were determined by conducting a series of three analyses (Chaudhri et al. 2007, Chaudhri et al. 2010): (1) a domain analysis of textbooks and AP exams in the three domains; (2) a user-needs analysis of the domain expert's requirements for formulating knowledge; and (3) an analysis of a user's question-formulation requirements.
The domain analysis identified the four mostfrequent types of knowledge representation needed in these three domains. These four types of knowledge contribute to answering approximately 50 percent of the AP questions (in order of importance): conceptual knowledge, equations, diagrams, and tables. (1) Conceptual knowledge represents classes, subclasses, slots, slot constraints, and general rules about class instances. (2) A majority of questions in physics and some questions in chemistry involve mathematical equations. (3) All three domains make extensive use of diagrams. (4) Tables are often used to show relationships not repeated elsewhere in text.
A knowledge-formulation system was designed to accommodate these four knowledge types, but the module for diagram knowledge has not yet been implemented. Subsequent analyses were conducted to catalog the additional KRR challenges in each domain that will be discussed later.
The user-needs analyses showed three main areas of concern for knowledge formulation by domain experts who are not trained in KRR: (1) knowing where to begin is often challenging for domain experts (the blank slate problem); (2) knowledge formulation consists of a complete life cycle that includes initial formulation, testing, revision, further testing, and question answering; and (3) the system should place a high value on usability to minimize required training.
The users asking questions are different from the users who enter knowledge, and the training requirements must be kept minimal because we cannot assume that the questioner will have an intimate familiarity with the knowledge base or the knowledge-formulation tools. Because the questioner must specify a wide variety of questions, including problem-setup scenarios in some questions, we could not use a rigid interface; instead, we adopted an approach based on natural language input.
We analyzed the English text of AP questions in all three domains (Clark et al. 2007). The language of science questions involves a variety of linguistic phenomena. We identified 29 phenomena and their frequency of occurrence (Clark et al. 2007). For example, approximately 40 percent of questions used direct anaphora, 50 percent used indirect anaphora, and 60 percent used prepositional phrases. This data served as the basis for the question- formulation language design of AURA.
For the current phase of development, we consciously chose to not leverage any methods for automatic reading of the textbook for the following reasons: First, we expected the system challenges to be significant without introducing a language- understanding component. Second, for the detailed knowledge representation and reasoning needed to answer AP questions in all three domains, we did not expect any automatic technique to approach the needed representation fidelity. Finally, for knowledge that involves computations and diagrams as in physics and chemistry we did not expect fully automatic methods to be very effective. The AURA architecture does include provisions to import information from external sources, such as semantic web sources or well-developed ontologies, that might have been created automatically (Chaudhri et al. 2008).
AURA System Architecture
The AURA system has three broad classes of functionality: knowledge formulation; question formulation; and question answering. In addition, there is a training program for both KF and QF, which was developed over several years of experience training domain experts for both roles. In figure 1, we show the overall system architecture. Figure 2 illustrates a domain expert working with AURA.
Knowledge Representation and Reasoning
AURA uses the Knowledge Machine (KM) as its core knowledge representation and reasoning engine, a powerful, mature, frame-based knowledge representation system.2 Though KM is comparable to many state-of-the-art representation and reasoning systems, there are two features that are distinctive and have played a special role in AURA: prototypes and unification mapping (or UMAP).
A prototype represents the properties of all members of a concept using a notional example of that concept. The syntax of a prototype is a graph data structure, depicting the properties of that notional example as a set of interconnected nodes and relations (see later figures for examples). The use of a graph-based representation is highly significant as it means that the internal form and its presentation to the user are the same, allowing the user to view and modify the repesentation directly through graph manipulation, rather than editing logical axioms that would encode the same knowledge. The semantics of a prototype have a formal axiomatic specification, asserting that all individuals of that concept have the properties of the notional example..
Syntactically during reasoning, to infer properties of an individual, KM merges, or "unifies," all the prototype graphs of the concepts that the individual belongs to with that individual, thus constructing a graph-based representation of an individual with all the properties of its concepts' prototypes. Semantically, this operation of unifying two individuals, called UMAP, is simply to equate them plus recursively conditionally unifying the value(s) of their properties. Two property values are unified if either deductively they must be the same (for example, due to cardinality constraints), or heuristically they appear to be the same (for example, are of the same type). The lat- ter use of equality heuristics distinguishes UMap from equality, and allows KM to draw plausible inferences in an underspecified knowledge base, filling in details that an SME might leave out. Although in principle UMAP can make mistakes (as it is unsound), in practice this is rare and significantly outweighed by its advantages in replicating the kind of equalities that a person would naturally assume. We give an example of the use of UMAP in the next section.
Both prototypes and UMAP were first used in the context of a system called SHAKEN, which was developed as part of the U.S. Defense Advanced Research Project Agency's Rapid Knowledge Formation program (Clark et al. 2001). The positive result from this prior work was the basis for including them as a central design feature in the AURA system.
Our approach to knowledge formulation includes three salient features: (1) the use of a document as a starting point and context for all knowledge entry; (2) a prebuilt library of components that provides the starting point for any KF process; and (3) the choice of user-interface abstractions that are driven by a usability analysis and the native representations of knowledge within a textbook. We discuss each of these aspects of KF in greater detail.
We embed an electronic copy of each of the three textbooks into the user interface of AURA to serve two purposes: First, it helps specify the context and the scope of the knowledge to be entered. Second, a semantic search facility based on Word- Net (Felbaum 1998) mappings from words in the document to concepts in the knowledge base serves as the basis of making suggestions for concepts relevant for encoding that word.
The SMEs build their knowledge bases by reusing representations in a domain-independent knowledge base called the Component Library or CLIB (Barker, Porter, and Clark 2001). The Component Library is built by knowledge engineers (KEs) and contains domain-independent classes such as Attach, Penetrate, Physical Object; predefined sets of relations such as agent, object, location; and property values to help represent units and scales such as size or color. These classes and relations and their associated axioms provide a starting point to the SMEs in the KF process. A selection of top-level classes in CLIB is shown in figure 3.
To capture the most frequently occurring knowledge types identified earlier, we settled on the following user-interface elements: directed graphs for structured objects (concept maps) and logical rules and equations for mathematical expressions. To enhance the usability of the system, we implemented interfaces for chemical reactions and tabular data. We expected that this capability would enable users to encode knowledge sufficient to answer approximately 50 percent of the AP questions in all three domains. A detailed account of these choices and the underlying theory is available elsewhere (Chaudhri et al. 2007).
As an example, in figure 4, we show a (simplified) representation of the concept of a eukaryotic cell. The node labeled as Eukaryotic-Cell is the root of the graph and is a prototypical individual of that class. The gray nodes represent nonroot individuals in the graph; the unboxed words such as haspart are relations between individuals and are shown as the labels on the edges. Logically, the graph denotes a collection of rules that assert that for every instance of Eukaryotic-Cell, there exist instances of each node type shown in this graph, and that they are related to each other using the relations in the graph. Examples of specific logical forms generated are included in a later section of the article.
From a logical point of view this rule could be broken into multiple rules, for example, each rule stating the existence of a part, and another rule stating their relationships. The prototypes combine multiple rules into a single rule to provide a coarser granularity of knowledge acquisition. Abstraction offered by prototypes, and the fact that a prototype mirrors the structure of a concept map as seen by a user, contributed to enabling the domain experts to author knowledge.
As an example of a process in biology, in figure 5, we show a (simplified) concept map for mitosis. This concept map shows the different steps in mitosis (prophase, metaphase, and so on), their relative ordering, and that its object is a diploid cell and its result is two diploid cells. The numbers shown next to a lock symbol in the relations, such as result, represent the cardinality constraints. For example, the result of mitosis is exactly two diploid cells. The current AURA system supports such declarative descriptions and reasoning about processes, but does not currently support running process simulations.
The SMEs create the concept maps using four primary graph-manipulation operations: (1) adding a new individual to a graph; (2) specializing an individual to be an instance of a more specific class; (3) connecting two individuals using a set of predefined relations; and (4) equating two individuals. Equating two individuals uses the UMAP. As an illustration of UMAP, in figure 6, we show the concept of H2O (or water) from chemistry. The top part of this graph encodes that every individual instance of H2O has-part an OH- ion and H+ ion, and further an H+ ion has-atom H. The lower part of the graph shows another H2O individual that is added to this graph. If the user equates the two H2O individuals in this graph, the UMAP operation will recursively equate the H+, OH- that are related by has-part and H that is related by the hasatom relation. This inference is heuristic and plausible. For this inference to follow deductively, the SME would need to encode cardinality constraints on has-part and has-atom relations. UMAP can draw equality inferences even when the knowledge base is underspecified in that the cardinality constraints are not specified. In some cases, all the cardinality constraints are not known; in other cases, adding cardinality constraints may be incorrect. The ability of UMAP to work with such underspecification in the knowledge base substantially contributed to the usability of the concept map-editing interface of AURA.
As a final example of a concept formulated using AURA, in figure 7, we show a concept map for Free Fall. The concept map encodes different properties of Free Fall and the mathematical equations that relate them. The property values are shown in green ovals, and the mathematical equations are shown in green squares. AURA supports a "what you see is what you get" editor for entering equations, and the equations can be related to properties that are represented in the knowledge base.
We have designed a training course for SMEs that prepares them to enter knowledge into AURA. The current KF training is approximately 20 hours. The training introduces the SMEs to the mechanics of using the system and to basic knowledge engineering principles. In the knowledge engineering section of the training, the SMEs learn about different classes and relations in CLIB, and how to use them. The training program includes several hands-on exercises in which SMEs encode knowledge and are given feedback on their specific choices. The core of the training program is common across all three domains. There are, however, several domain-specific modules. For example, physics SMEs must learn to properly use vector math, which does not arise in the other two domains. For chemistry, the SMEs must learn about entering chemical compounds and reactions, and about chemistry-specific, system-available knowledge. For biology SMEs, there is an added emphasis on learning about describing processes.
Recall that the users asking questions are different from the users who enter knowledge, and that the training requirements must be kept low. Further, we cannot assume that the questioner will have an intimate familiarity with the knowledge base or the knowledge-formulation tools. Our questionformulation design aims to account for these requirements.
While there has been considerable recent progress in question answering against a text corpus (for example, Voorhees and Buckland 2008), our context is somewhat different, namely posing questions to a formal knowledge base, where a complete, logical representation of the question is needed for the reasoner to compute an answer. In this context, the designer is typically caught between using "fill-in-the-blank" question templates (Clark et al. 2003), which severely restricts the scope of questions that can be posed, or attempting full natural language processing on questions, which is outside the reach of the current technology. In AURA, we have aimed for a "sweet spot" between these two extremes by using a controlled computer-processable language (a simplified version of English) called CPL for posing questions, with feedback mechanisms to help in the question-formulation process. Our hypothesis is that a controlled language such as CPL is both easily usable by people and reliably understandable by machines and that, with a small amount of training and good run-time feedback mechanisms, users can express their questions easily and effectively in that form.
A basic CPL sentence has the form
subject + verb + complements + adjuncts
where complements are obligatory elements required to complete the sentence, and adjuncts are optional modifiers. Users follow a set of guidelines while writing CPL. Some guidelines are stylistic recommendations to reduce ambiguity (for example, keep sentences short, use just one clause per sentence), while others are firm constraints on vocabulary and grammar (for example, words of uncertainty [for example, "probably," "mostly," are not allowed, not because they cannot be parsed but because their representation is outside the scope of the final logical language]). Examples of typical AP questions from the three domains, and a typical reformulation of them within CPL, are shown in figure 8. As shown, questions (especially in physics) may be multiple sentences divided into a "setup" describing a scenario and a "query" against that scenario. Multiple-choice questions are reexpressed in CPL as separate, full-sentence questions.
To pose a question, the user first enters a CPL form of it in the interface. If a CPL guideline is violated, AURA responds with a notification of the problem, and advice about how to rephrase the question. If this happens, then the user rephrases the question, aided by a searchable database of example questions and their CPL equivalents, and a list of the vocabulary that CPL understands, and the process repeats. Alternatively, if the question is valid CPL, then AURA displays its interpretation in graphical form for the user to validate. An example of this graphical form is shown in figure 9, depicting how AURA interpreted the first example in figure 8 in terms of individuals, relationships, and the focus of query (denoted by a question mark). If the interpretation appears incorrect then the user would again rephrase the CPL to correct the problem. The graphical interface also allows a user to perform a limited number of edits, for example, changing the relation or asserting that the two nodes are equal. Otherwise, the user instructs AURA to answer the question invoking the query answering described in the next section.
Note that using a controlled language involves a trade-off between machine understandability and fidelity, that is, the process of making the question machine understandable may involve simplifying or expanding the original question's semantics. For many questions (for example, "Does a eukaryotic cell have a nucleus?") there is no loss of fidelity, but for more complex questions a more significant rewording may be needed. Example 2 in figure 8 illustrates this, where "What two molecules must always be present in the products...?" is reexpressed as "What are the products...?" In such cases there is some cognitive burden on the user to use his or her linguistic and general knowledge to simplify "wordy" English and know what simplifications are reasonable, combined with the user's knowledge of the kind of statements AURA understands, acquired from training and experience using the system. A controlled language represents a pragmatic middle ground, trying to balance the machine-understandability/fidelity trade-off. We evaluate the effectiveness of this later in this article.
Let us now consider how this design meets the requirements of the questioner. The CPL formulations expected of questioners are in terms of English words and, thus, do not require intimate knowledge of the knowledge base's vocabulary. To read the interpretation graph, the questioners must understand the meaning of the concepts and relations. Through AURA, the questioners can access the documentation of the classes and relations, and a vocabulary list of all classes and relations known to the system. The task of understanding the terms of the knowledge base by inspection is significantly easier than using those terms for creating new concepts as the SMEs are required to do. CPL also allows questioners to construct problem scenarios with respect to which a question is asked.
Once a question has been formulated to a user's satisfaction, AURA attempts to answer it. Conceptually, the question-answering module of AURA has four functional components: reasoning control, a reasoning engine, specialized reasoning modules, and explanation generation.
The reasoning control relates the individuals in the question interpretation to the concepts in the knowledge base, identifies the question type, and invokes the necessary reasoning. In some cases, relating an individual to a class in a knowledge base is straightforward, especially as AURA allows SMEs to associate words with the concepts that they create. In other cases, AURA must resort to specialized reasoning based on search and semantic matching (Clark et al. 2007, Chaw et al. 2009 ).
A question type denotes a style of formulation and reasoning used for answering a question. Currently supported question types are: computing a slot value, checking if an assertion is true or false, identifying superclasses, comparing individuals, describing a class, computing the relationship between two individuals, and giving an example of a class.
AURA uses the Knowledge Machine as its core reasoning engine. AURA has a special-purpose reasoning module for solving algebraic equations that is used extensively both in physics and chemistry. It has a graph-search utility to support the question type that computes relationships between two individuals. There is a chemistry-specific module aimed at recognizing chemical compounds and reactions, and a physics-specific module to support vector arithmetic.
Finally, AURA supports an incremental explanation system that produces explanations in (rudimentary) English. Some of the terms in the explanation are hyperlinked, and the user can drill down to obtain more information. As an example, in figure 10, we show the answer to the question shown as example 1 in figure 10.
AURA first presents an answer to the question (s = 111 m) followed by the explanation. In the explanation, AURA shows the equation and specific variables used to solve the equation. In more complex questions that use more than one equation, the explanation includes the specific order