The Design and Interpretation of Unobtrusive Evaluations

If one sets aside the specifically unobtrusive aspect of unobtrusive evaluation, what one has is nothing more than a standardized test. To be valid, such a test must administer the same questions to each subject, and its proper use is for measuring relative importance for comparison and ranking, not for assessing the overall quality of library service. There is support for this argument in the description of the methodology employed in a number of such studies, and in the reanalysis of data published in McClure and Hernon (1983) and Dilevko and Dolan (1999). The "55 percent rule" of Hernon and McClure (1986) is rejected as a spurious generalization. Unobtrusive evaluation has potential use in comparing the effectiveness of different ways libraries may organize collections and services.


Unobtrusive evaluation of reference service has a long--and in the minds of some--controversial history. Prior to the first such work, in the late 1960s, research on reference service was largely confined to quantitative studies of the simplest kind (number of reference books, total budget, number of staff), self-assessment of success by reference staff (always favorable), and reports on user satisfaction. (1) Of the latter, Rothstein famously said that the testimonials could not have been better had we paid for them. (2) Rothstein's work provides a survey of reference service evaluations to the 1960s.

It came as a surprise therefore when unobtrusive evaluations of reference service began to appear, undermining the self-confidence and perhaps the self-satisfaction of the library community. "Unobtrusive evaluation" and "unobtrusive test" are used interchangeably herein, although strictly speaking the latter is a special case of the former. In 1967, Crowley, inspired by Webb's work on social science methodology, undertook an unobtrusive study of reference service in public libraries in New Jersey. (3) This was followed by the work of Childers, who extended Crowley's study, using more sophisticated statistical analysis. Their studies, originally Rutgers University doctoral dissertations, were published together in 1971. (4)

Their work led to many similar studies, some of them published, some of them carried out for internal use at one institution or another. (5) These studies share some common features, most notably a set of strictly informational questions, whether specially created for the study or derived from actual, observed reference interactions, which are administered unobtrusively, or surreptitiously, by proxies posing as ordinary library users. The results are then used to compare the quality of reference in various kinds of libraries, for example, academic versus public or libraries in different regions, or to investigate correlation between quality of service and variables such as number of volumes and hours of service. In most cases, investigators have also reported an overall aggregated success rate in answering questions, which is consistently, even shockingly, low. It is this latter result that has attracted greatest attention, leading Hernon and McClure in 1986 to posit a "55 percent rule." Hernon and McClure concluded, based upon evidence from a range of such studies, that a user asking an informational question in a library generally has a 55 percent chance of getting a correct answer. (6)

Unobtrusive testing is not without its critics, who have argued that its questions have been unrepresentative and are only a small part of the services rendered at a reference desk, that the samples are too small, and that other indicators like willingness of users to return are better measures of success. Such criticism is usually in response to the reported low overall success rate. Discussion of unobtrusive testing now comprises a considerable literature. (7)

This article will not argue for or against unobtrusive testing. The three-fold objective here is rather different: to argue, first, that all subjects in such tests must be administered the same test; second, that such studies are chiefly useful for measuring relative performance--for purposes of comparison and ranking, not for assessing overall, global quality of service, and third, the one area where unobtrusive tests may have some use is for comparing and evaluating the various modes libraries use for organizing collections and delivering services. …

