Gathering Usage Statistics at an Environmental Health Library Web Site

Article excerpt

During the summer of 1995, the Department of Environmental Health (DEH) Library at the University of Washington decided to create a Web site. The three main goals were as follows: 1) To serve as an orientation tool for new students, faculty, and staff, as well as for community members: The Web site would introduce them to the Environmental Health Library and its holdings, layout, and policies; 2) to serve internally (for the department and the university) and externally (for individuals and institutions) as an annotated and well-indexed connection to the best environmental and related links on the Web: Eventually, the site would contain internal bibliographies and documents; and 3) to serve as a publicity mechanism to advertise our library and department and to connect people to them.

The University of Washington academic year begins in late September, so we got our home page up in time for incoming students. We advertised our site internally and externally and included a counter on our page for gross estimate of use.

However, we were interested in more specific feedback concerning usage of our site. Given the time investment building and maintaining our own Web site, plus the vast number of people using the Web and creating new Web sites elsewhere, and the rapid pace of technological change in the world of the Internet, this seemed advisable if we were to maintain the most useful, effective, and well-used site possible. We wanted to know how often our site was used, who was using it, and which pages got the most use.

Encouraged by department of environmental health chairperson and biostatistician Gerald Van Belle, we looked at a number of software programs written to gather Web statistics and found that several would allow us varying analyses. The programs we ultimately applied to our task are all in PERL scripts, distributed at no cost on the Web.

We used them to gather data--some of which is presented in the accompanying sidebars--for an extended period during the 1995-96 academic year. The results you see here were generated from the April 1996 access and referrer logs for the DEH Library Web pages (http://weber.u.washington. edu/~dehlib). The logs reflect information about requests for Web pages that were presented to our server. The April logs were chosen because Musage--one of the statistics program we used--automatically splits output into separate files for each month, unlike some other programs. A month was then chosen for ease of comparison.

Below are descriptions of how we used the three statistics programs selected for this project, along with information about the programs' availability.

The Programs and How We Used Them

Wwwstat 1.0

(Available from UC Irvine, Department of Information and Computer Science; http:// www.ics.uci.edu/pub/websoft/wwwstat)

Wwwstat generates formatted output files. (See the sidebar "Wwwstat 1.0 Statistics for DEH Library.") Most results are arranged alphabetically by domain, name, country, or file name. It provides the following information:

* Number of files requested, by country, domain (a more specific location), and subdomain (a department, as in the University of Washington)

* Number of requests, as follows: average number of requests per day; number of requests during each hour of the day (to determine heavy use times of day); and total number of requests on each day (to determine overall pattern)

* Number of times each page was requested

Wwwstat can be customized to ignore file names and computer addresses. This filters out accesses of images and test files, including most staff accesses. In our case, it does not filter out use by staff at non-DEH library locations.

Wwwstat does not have a Domain Name System (DNS) name lookup function. (Each server has a DNS name, such as "washington.edu" or "sedona. net.") When the log files contain a numeric IP (Internet Protocol) address (e. …