Mining User Communities in Digital Libraries

Article excerpt

The interest in the analysis of library user behavior has been increasing rapidly since the advent of digital libraries and the Internet. In this context, the authors analyze the queries posed to a digital library and recorded into the Z39.50 session log files, and construct communities of users with common interests, using data-mining techniques. One of the main concerns of this study is the construction of meaningful communities that can be used for improving information access. Analysis of the results brings to the surface some of the important properties of the task, suggesting the feasibility of a common methodology.

**********

Information services aim to satisfy the needs of their users in a way that ensures precision and effectiveness. Many of these services use intelligent information retrieval and filtering techniques to personalize and customize their content to the users' interests and preferences. (1) Several information providers exploit user modeling techniques to understand and evaluate the usage of their services. The study of user behavior has become a crucial point in a number of digital library projects, and specifies a number of critical factors during the design and development process of a digital library. (2) Data mining offers powerful techniques for discovering nontrivial and useful patterns in voluminous datasets. Many such techniques have been applied to information services and especially to the Web, offering personalization and improving information access. (3) The application of such techniques in library systems comprises the bibliomining domain, which aims to upgrade almost all the decision-making processes concerning library and information management.

The goal of this paper is to show how the administrator of a digital library can analyze user behavior and extract the data necessary for improving information access. In particular, the authors are interested in formulating community models, which represent patterns of usage of digital libraries and can be associated with different types of users. The authors' main concern is the association of the digital library queries. Similar queries, recorded into Z39.50 session log tiles, are grouped into clusters. The clusters map user community models that represent the users' demands and querying habits. Such models could be used in a query-expansion process contributing to efficient retrieval. Generally, the query analysis can be beneficial to both the digital library and its users in many ways:

* Service optimization--helping the administrators reorganize the digital library content, authorities, and user interfaces, making them more suitable for different user groups

* Decision support--helping the administrator form an effective query expansion strategy for the digital library

* Personalization--helping the users identify information of interest to them by recommending similar subjects

This project is motivated by the need to improve the querying mechanisms provided by the digital library of the Hellenic National Documentation Center (NDC) (http://theses.ndc.gr), which is one of the most significant in Greece, consisting of many collections that are unique world wide and of international interest. The digital library of NDC is targeted to a number of user groups, mainly in Greece and from a variety of scientific domains, including students, researchers, professionals, and librarians.

In the following section the problem of creating user communities is described together with the methods followed. In addition, the digital collections of NDC, their characteristics, the targeted user groups they refer to, and the functionality of the available operations by the system are discussed. In the Experimental Results section, the authors' methods are applied to two different collections of the NDC. Finally, a number of interesting issues derived from this work are presented for further research. …