Web Usage Mining at an Academic Health Sciences Library: An Exploratory Study

Article excerpt

Objectives: This paper explores the potential of multinomial logistic regression analysis to perform Web usage mining for an academic health sciences library Website.

Methods: Usage of database-driven resource gateway pages was logged for a six-month period, including information about users' network addresses, referring uniform resource locators (URLs), and types of resource accessed.

Results: It was found that referring URL did vary significantly by two factors: whether a user was on-campus and what type of resource was accessed.

Conclusions: Although the data available for analysis are limited by the nature of the Web and concerns for privacy, this method demonstrates the potential for gaining insight into Web usage that supplements Web log analysis. It can be used to improve the design of static and dynamic Websites today and could be used in the design of more advanced Web systems in the future.

INTRODUCTION

The Web has had a significant impact on libraries, changing the formats and methods in which they present their resources and services to users. Most libraries use a Website as a tool for organizing and providing access to resources in print and electronic formats. The impact of the Web for libraries, however, has been more profound than providing a new access point for its users. The Web has also changed the information seeking behavior of and information use by library users and perhaps expanded the definition of who libraries' users are.

In the not-so-distant past, searchable bibliographic databases were limited in their availability, accessible only to information professionals or from workstations located in the library. The Web has changed this, allowing information discovery tools such as Alta Vista, Yahoo, and Google to become a part of everyday life for many. The increased availability of search tools seems to have changed the behavior of users, many of whom no longer need to come to the library as frequently as before. This change in the use of physical libraries is further demonstrated by a decline in foot traffic and the number of reference questions fielded by libraries in the past five years [I].

Despite the declining in-person usage of libraries, many users still depend on libraries to aggregate and provide access to a range of electronic resources. In the past, gate counts and reference and reshelving statistics were used to measure the use of physical resources. Assessment continues to be important in an electronic environment to determine the effectiveness of a library in shepherding its users to appropriate resources. In addition to the need for understanding how a library's own users access its resources, assessment also creates possibilities for identifying new markets for resources and services. The Web makes libraries virtually available to users who may not be physically able to access them. For some libraries, the Web also provides an opportunity to identify new services to better serve existing users, to possibly redefine who those users are, and to reach out to those who have not been served before.

This paper will explore the potential of a Web usage mining technique, regression analysis, to analyze navigational routes used to access the gateway pages of the Arizona Health Sciences Library (AHSL) Website over a six-month period.

LITERATURE REVIEW

A number of articles have discussed Web server log analysis for libraries, since libraries began to develop Web presences [2-9]. These articles describe summary level metrics of Website usage, such as the total number of user sessions, broken down by variables such as date, time, or host domain of the requestor. As noted by many of these authors and by Goldberg [10], these studies have been constrained by two main factors. One, data provided by the hypertext transfer protocol (HTTP) that governs user transactions on the Web are very limited. second, usage logs are designed for use by system administrators, not for tracking users. …