This article summarizes the results of the 6-7 July Workshop on Human Language Technology and Knowledge Management held in Toulouse, France. It describes invited keynotes, presentations, and results of brainstorming sessions to create a technology road map for this important area. The group also articulated grand challenges in human language technology and solutions to these challenges that could benefit facilities for knowledge discovery, access, and exploitation.
The Workshop on Human Language Technology and Knowledge Management was held on July 6 and 7 in Toulouse, France, in conjunction with the meeting of the Joint Association for Computational Linguistics and European Association for Computational Linguistics (ACL / EACL '01). Human language technologies promise solutions to challenges in human-computer interaction, information access, and knowledge management. Advances in technology areas such as indexing, retrieval, transcription, extraction, translation, and summarization offer new capabilities for learning, playing, and conducting business. These adances promise to support enhanced awareness, creation, and dissemination of enterprise expertise and know-how.
Organized by the European Network of Excellence in Human Langauge Technologies (Steven Krauwer, U. Utrecht) and The MITRE Corporation (Mark Maybury), the workshop brought together a group of 50 computational linguists, Al researchers, and computer scientists from North America, Europe, Asia, Australia, and South Africa working in a range of areas (for example, speech and language processing, translation, summarization, multimedia presentation, content extraction, dialog tracking) both to report advances in human language technology and their application to knowledge management and to work toward a road map for the human language technologies for the next decade. In part, the workshop focused on human language technologies that could enable knowledge management functions such as the following:
Expert discovery: Modeling, cataloging, and tracking of distributed organizations and communities of experts
Knowledge discovery: Identification and classification of knowledge from unstructured multimedia data
Knowledge sharing: Awareness of, and access to, enterprise expertise and know-how
Table 1 (from Mark Maybury's introduction to the workshop) illustrates how these knowledge management functions are supported by a broad range of human language technologies, including query analysis-retrieval, information extraction, question answering, machine translation, agent-user modeling, summarization, presentation generation, and awareness-collaboration.
During the second day, John Domingue, deputy director of The Knowledge Media Institute at The Open University in England, gave the keynote entitled "Supporting Organizational Learning through the Enrichment of Documents." According to Domingue, only a small percentage of corporate training is ever applied within the workplace because organizations tend to use schoolbased methods of learning in contrast to organizational learning based on theories of learning in the workplace. Domingue described knowledge sharing by enriching web documents with informal and formal representations, a process that captures the context in which a document is created and applied. Domingue demonstrated how this enrichment facilitates retrieval and comprehension.
In addition, the group heard an invited talk from Hans Uszkoreit (DFKI Saarbruecken), scientific director at the German Research Center for Artificial Intelligence (DFKI), head of DFKI Language Technology Lab, and professor of computational linguistics at the Department of Computational Linguistics and Phonetics of Saarland University at Saarbrucken. Uszkoreit's talk was entitled "Crosslingual Language Technologies for Knowledge Creation and Knowledge Sharing." He described how "language technology can provide means for associating shared knowledge with the relevant decision situations by automatically linking it to the critical elements within decision triggers, that is, electronic documents in the work flow that demand and record a decision." Uszkoreit described the role of information extraction, automatic hyperlinking, and (human) inferencing in this process. He exemplified this "automatic relational hyperlinking" using the example of a hypercode system developed for a large German bank to facilitate work with legacy code by densely interlinking source code and documentation. Uszkoreit concluded by addressing cross-lingual knowledge management, describing his efforts to augment general-purpose translation systems with specialized terminology and transfer rules for multilingual expert groups in a project for a large multinational automobile manufacturer.
A poster session included system demonstrations and offered participants an opportunity for rich dialogue and interaction. Major papers sessions were held in ontology construction, question answering, summarization, multilingual processing, multimedia processing, and dialogue. Group brainstorming sessions followed each major technology theme, focusing on construction of a road map. During these sessions, the group focused on an analysis of the present situation, a vision of where we want to be in the future, and a number of intermediate milestones that would help in setting intermediate goals and measuring our progress toward our goals.
The group outlined key challenges and promising solutions in the areas of ontology, summarization, multilingual processing, and multimedia processing. With respect to ontologies, the group emphasized the need for tools and tasks that were reusable across domains to create and populate ontologies; the importance of a user-centered process view; the need to integrate shallow and deep methods; the need to collaborate with domain ontology creators; and the need to address ontology quality, ambiguity, and usability (for example, using tools for structuring, integrating, visualizing, and accessing massive or heterogeneous ontologies). The group highlighted the promise of the semantic web, the importance of information extraction "plug-ins," the possibility of organizing massive documents using domain-specific ontologies, the opportunity to use a "top" or core ontology to bootstrap new domains, the value of multidisciplinary collaborative teams (for example, domain experts, linguists, knowledge engineers), the value of controlled language management, and the promise of component-based methods to facilitate ontology decomposition, reuse, and life-cycle management.
With respect to summarization, the group outlined challenges as including the appropriate level-depth of analysis-representation (for example, semantic relations, speech acts, rhetorical structure), summarization presentation-visualization, speech for presentation of short summaries, the appropriate use of indicative versus informative summaries, and the need for action-oriented summaries (for example, executive-management summaries). The group discussed a range of solutions encompassing the analysis of information, its transformation (including operations such as selection, aggregation, abstraction), and its presentation.
The group also identified a number of fundamental multilingual challenges, including relations between cultures, languages, lexical resources, and ontologies; the importance of domain knowledge and the adaptation-integration of semantic resources; the complexity of dealing with one-to-one translation of even the 200 most spoken-written languages (requiring 39,000 language pairs); the need for large-scale, robust natural language processing and, at the same time, the importance of fine-grained linguistic knowledge; and the challenge of new application domains such as content-driven hypertextual authoring and crosslingual news linking.
The group identified resources (for example, WORDNET, EURONET, application databases, text resources) as key to advancement, the INTERLINGUA approach as promising, the importance of deeply annotated data combined with machine learning, the promise of translation memories and machine learning, and the possibility of tailoring multiple ontologies to users and their tasks.
Finally, the group turned its attention to multimedia challenges and opportunities. Challenges include the integration of multiple media; the nature of processing (is it centralized or mobile); the challenges of privacy, security, and scalability; the importance of both remembering and forgetting information; the need for multilingual and multisource information extraction; and the challenge of cross-document coreference resolution. Location-based services were highlighted as a promising future area.
Two cross-cutting enabling capabilities were identified for all the addressed areas. First is the need for (intelligent) text annotation. Second is the need for large-scale annotated corpora to enable automated training and system evaluation.
ELSNET has captured the workshop input and will continue to revise a technology road map. A web site to share the materials and results of the workshop has been set up.1
Mark Maybury is executive director of MITRE's Information Technology Division in Bedford, Massachusetts. He is a member of the board of directors of the Object Management Group, secretary-treasur
er of the Association of Computing Machinery SIGART, and a member of the Intelligent User Interfaces Steering Council. Maybury has published over 50 technical and tutorial articles and is editor of a number of books. Maybury received a doctorate in AI from Cambridge University in England in 1991. Maybury is an international adviser to the German Ministry for Education and Research.…