Angi Voss, Keiichi Nakata, Marcus Juhnke, Thomas Schardt
GMD - German National Research Center for Information Technology
In today's electronic environments, knowledge is often captured implicitly through collections of documents in email or news archives, bookmark lists, or shared workspaces. Insights drawn from the documents are seldom made explicit, although persons who work in teams, form a group or belong to a community would profit a lot from sharing their knowledge. The efforts of extracting the knowledge hidden in the documents and, later, to quickly find a suitable piece of knowledge are still greater compared to the benefits of knowledge sharing and reuse.
We present an infrastructure for efficient and effective access to the knowledge in a document collection. It is called a "concept index" and addresses the following questions:
What is the content of the document collection? Users can introduce concepts and relations between them as a means for content-oriented navigation through the documents.
What do the concepts represent? Concepts are not formally defined, but instead assigned pieces of text that express them in the documents. All possible occurrences of concepts are dynamically spotted by matching similar text pieces in the document collection. Thus, each concept is indexed to its meaning implicit in its usage in the documents.
What are the nuggets in the documents? Text pieces in a document that are assigned to the concepts can be seen as a source of shared knowledge. Document presentation can expose such nuggets by highlighting concept occurrences.