Use of the Normalized Word Vector Approach in Document Classification for an LKMC
Parker, Kevin R., Williams, Robert, Nitse, Philip S., Tay, Albert S. M., Issues in Informing Science & Information Technology
Many small businesses and entrepreneurs lack the time and resources to properly conduct competitive intelligence (CI) activities. Operational issues take up most of the owners' time and leave few other resources to devote to CI activities. There is, therefore, an opportunity for an outside entity to provide the needed services that will enable the small business to compete effectively. The concept of Libraries as Knowledge Management Centers (LKMC) was proposed to address problems faced by small businesses on one front, and by libraries on another (Parker, Nitse, & Flowers, 2005). At the core of the proposal is the premise that libraries can extend their services to act as knowledge management (KM) centers for small businesses, providing both KM and CI support. The arrangement would be beneficial both to libraries and to small businesses. Libraries benefit because it is an opportunity to reaffirm their relevance in a digital age in which so much information is freely available to patrons and library funding is deteriorating (ALA, 2004). Small businesses benefit because they are often unable to gather sufficient internal and external knowledge to assist in strategic planning and positioning, and thus are unable to compete with larger rivals whose resources allow them to develop sophisticated KM and CI systems. LKMCs hold promise to help level the playing field.
The seminal paper (Parker et al., 2005) enumerated the requirements that must be met for libraries to expand their services to act as KM centers for small businesses. This paper describes a single phase of the study, investigating the use of document classification techniques to classify and catalog digital documents for an LKMC. One of the linchpins of the LKMC is the ability to locate and retrieve pertinent information quickly. Therefore, accurate and efficient document categorization is an essential first step in the realization of an LKMC. The following section lays out the components of an LKMC, and explains each in detail.
Components of a Library Knowledge Management Center
As noted earlier, the seminal paper (Parker, Nitse, & Flowers, 2005) enumerated the requirements of an LKMC that must be met for the expansion of library services to include KM and CI offerings for small businesses. First, some businesses are associated with a particular jargon, and if such businesses are to be served by the LKMC then appropriate domain ontologies must be developed. Second, automatic document classification must be available to determine the content of both existing digital documents as well as new documents that are being delivered on a constant basis by streaming information sources. Next, library indexing or cataloging systems must be modified to incorporate conceptual details about documents so that Semantic Web technology can be used to semantically link the library's resources, making semantically related documents easier to retrieve and deliver. Each of these components will be briefly considered.
A domain ontology is a clearly stated formal specification of the basic concepts (objects, concepts, and relationships) that are known to exist in some area of interest. Specific domains can be identified and a common ontology can be defined to map vocabularies of specified terms with generally accepted definitions (Gruber, 1991). Tools like the Ontolingua Server are available to assist in the development of ontologies (Farquhar, Fikes, & Rice, 1997). Building a domain ontology for a specific business type requires a thorough understanding of the domain. Therefore the process should start by identifying general terms common to all small businesses, and then narrowing the focus to a specific business with the purpose of determining common industry terms, organization-specific terms, and even project-specific terms. A complete domain ontology spans a wide spectrum of corporate interests, thus providing the means to identify a greater percentage of relevant information. …