Analyzing Web Server Logs to Improve a Site's Usage

Article excerpt

One of the projects that I've been working on in the last few weeks involves trying to increase the visibility and effectiveness of one of the Web sites I maintain. I've mentioned my involvement with Vanderbilt University's Television News Archive in previous columns. We have a Web site (http://tvnews.vanderbilt.edu) that provides access to the archive's large collection of news programming through a searchable database. The archive's staff members populate this database with detailed information about each segment of news programming that is recorded, complete with a narrative abstract. Through this database, people can find items that meet their personal interests or research needs. Some users find the information they seek by reading the abstracts in the database, but others need to view the video itself. We offer a videotape loan service; we make copies of news programs that users can borrow. The database interface includes an e-commerce function that allows users to select the items they want to view and pay the service fees we charge to recover our costs.

[ILLUSTRATION OMITTED]

The effectiveness of the TV News Archive's Web site can ultimately be measured in the number of successful searches that are performed and the quantity of videotape requests that are placed. While we have seen a significant increase in these activity levels over the course of the last few years, we continue to look for ways to boost activity even more. We continue to believe that we haven't yet gotten to the point of reaching all potential users or that those who do visit our site always find the material they seek.

Recently, I've been looking for ways to increase the activity on the Web site and, hopefully, to boost the number of videotape loan requests. This project includes two lines of investigation. One is streamlining and optimizing how the Web site works and then improving usability; the other is devising strategies to improve the site's visibility and discoverability on the Web. During the past few weeks, I've been working on ways to expose more of our metadata on the open Web to increase use of the site. I'll probably be talking about that in a future column--this one focuses on the methods of analysis available to study the usage of the site, which are necessary to identify any problems and to make improvements.

One of the main characteristics of our archive is that we get very few on-site visitors. Almost all use comes from remote users via the Web site, so it's essential that it work well. I've been studying the site's usage in detail. While some of the techniques I've used may be particular to our site, most apply to analyzing any library site.

Conducting in-person usability studies and focus groups is one of the best ways to learn about the usability of a Web site. While we have done that, I'm now taking a more forensic approach--analyzing logs and other system data to measure the effectiveness of the Web site design and search engine.

Studying Web Server Logs

Web server logs provide a wealth of raw data about how users approach your site. If you host your own site, you should have easy access to them. People who run their sites on external servers may need to negotiate with the systems administrator for access to the log files. All Web servers accumulate detailed information for each page requested. Exactly what elements the server records vary according to the type of software used and the options set by the site's administrator. Let's walk through some of the most important data elements.

The page requested describes the document the server was asked to deliver. The HTTP status code indicates whether the request was successful or whether some other condition occurred. This is represented by a three-digit number: 200 indicates that the page was delivered successfully; the dreaded 404 means the page could not be found. Many other status codes have been defined and are fully documented by the World Wide Web Consortium at http://www. …