Academic journal article
By Geyer-Schulz, Andreas; Neumann, Andreas; Thede, Anke
Information Technology and Libraries , Vol. 22, No. 4
Library systems are a very promising application area for behavior-based recommender services. By utilizing lending and searching log files from online public access catalogs through data mining, customer-oriented service portals in the style of Amazon.com could easily be developed. Reductions in the search and evaluation costs of documents for readers, as well as an improvement in customer support and collection management for the librarians, are some of the possible benefits. In this article, an architecture for distributed recommender services based on a stochastic purchase incidence model is presented. Experiences with a recommender service that has been operational within the scientific library system of the Universitat Karlsruhe since June 2002 are described.
Almost all scientific libraries feature electronic library management systems. With their online public access catalogs (OPACs), they possess all the requirements in almost the same manner as digital libraries for electronic value-added services. A very promising add-on for traditional libraries are recommender systems, the necessity for which arises from the need of scientists and students for efficient literature research, as shown by the survey of Klatt et al. (1) Due to--among other things--information overload and difficult quality assessment, information seekers are more and more incapable of compiling relevant literature from conventional database-oriented catalog systems in a time-efficient manner. Therefore, as the survey reveals, they rely heavily on peers for recommendations. Considering the tight schedule of many students, university teachers, and researchers, it is worth the effort to free up the valuable time consumed in steering each other to the standard literature of their fields, which could be done easily by behavior-based expert advice services. Moreover, in this scenario, they can also profit from the combined knowledge of all library users in contrast to the more restricted knowledge within their personal networks. Consumer acceptance and convenience of recommender systems are shown by the huge success of the broad variety of different services offered at commercial bookstore sites (such as Amazon.com). People are getting used to these services and appreciate them. So the question to ask is: Why are these services not offered on a broader scale within scientific libraries? Discussing this question with librarians and computer scientists, the following reasons were discovered:
* Privacy. Librarians are very considerate of the privacy of their patrons. Transaction-level data as well as reading histories must be protected.
* Budget restrictions. Public libraries in general run under tight budget restrictions. New electronic services for millions of users might require prohibitively high additional information technology (IT)-investments.
* Data size. The number of documents contained in many public or academic library systems is at least one order of magnitude higher than in most commercial organizations. This implies that transaction-level data is scattered on more documents.
While one would expect that more data implies a better chance for finding meaningful patterns, it becomes increasingly difficult to detect these patterns due to their sparsity, and because the computational complexity of counting such association rules is exponential in the number of objects. Standard association-rule algorithms reduce the complexity by deleting all objects that do not receive sufficient support. In a library context, the sparsity of the data, unfortunately, makes this approach not feasible. Increasing the support threshold to reduce the computational complexity will lead to pruning all meaningful but weak association patterns that may be below the support threshold, but that are still statistically significant. This article presents a strategy to overcome these obstacles with behavior-based recommendations that can be efficiently generated from anonymous session data on off-the-shelf PC systems. …