Knowledge Discovery in Bibliographic Databases. (Book Reviews)

Article excerpt

Knowledge Discovery in Bibliographic Databases. Ed. Jian Qin and M. Jay Norton. Library Trends 48, no. 1 (Summer 1999). Champaign: University of Illinois at Urbana-Champaign, Graduate School of Library and Information Science, 1999. 281p. single copy, $18.50 (ISSN 0024-2594).

Knowledge discovery in databases (KDD) is one of those arcane information science topics that seem both mysterious and inviting to most librarians, bearing an aura of the future of librarianship. While being discussed in the major information science journals (e.g., Trybula 1997; Vickery 1997; Raghavan et al. 1998), it has not found its way into mainstream library science literature. If for no other reason then, the appearance of this issue of Library Trends is a welcome development, especially because of its focus on using KDD in bibliographic databases.

The papers comprising this book have been artfully assembled. The introduction and a useful overview of KDD are followed by an assessment of classification schemes, from the standpoint of knowledge discovery. as devices of knowledge representation. This link to bibliographic organizational practice yields in turn to two accounts of finding new knowledge by discovering connections, through common citations, between sets of articles in the biomedical and philosophical literatures. Next is a demonstration of using cocitation links to forge a pathway of relationships through the literatures of several subject areas from economics to astrophysics. There follow three articles on different aspects of discovering knowledge in word-occurrence patterns, another four on automated knowledge discovery using various kinds of document surrogates (search-engine templates, metadata headers, abstracts, MARC-encoded geospatial data), and a concluding essay on the significance of automated information retrieval for librarians. E ach article takes on a distinct subtopic, complementing its neighbors and contributing to a largely satisfying whole.

At the same time, the collection suffers somewhat from not sufficiently tailoring its presentation to its primary audience. The authors are all well versed in the information science concepts underlying KDD, but unfortunately most working librarians lack such familiarity. Each article seems intended to introduce a particular aspect of KDD to the nonspecialist; only a few report new research. It is therefore doubly frustrating when bibliometric jargon and obscure statistical formulas are employed without explanation, as they frequently are in this volume. Such explanation would of course slow down a presentation and annoy information scientists, but by writing as if for JASIS (Journal of the American Society for Information Science), most of the authors have squandered an excellent chance to educate the working librarian and drive home the relevance of their topics.

A recurring theme in this volume is KDD's function of revealing the broader intellectual context of a scholarly work by using computer-aided association techniques to uncover links between two apparently unrelated articles. This process can have dramatic results. For instance, Don Swanson and Neil Smalheiser present a classic example of bibliographic KDD: linking articles through common citations to produce a promising but unsuspected idea for treating migraine headaches. In the following chapter, Kenneth Cory recounts how humanities researchers adapted Swanson and Smalheiser's methods and discovered an undocumented intellectual link between Robert Frost and the Greek philosopher Garneades. Henry Small's 331-article path from economics to physics is a spectacular demonstration of both the power of bibliographic association and the interrelatedness of knowledge. …