Disappearing Act: Decay of Uniform Resource Locators in Health Care Management Journals

Article excerpt

Objectives: This study examines the problem of decay of uniform resource locators (URLs) in health care management journals and seeks to determine whether continued availability at a given URL relates to the date of publication, the type of resource, or the top-level URL domain.

Methods: The authors determined the availability of web-based resources cited in articles published in five source journals from 2002 to 2004. The data were analyzed using correlation, chi-square, and descriptive statistics. Attempts were made to locate the unavailable resources.

Results: After checking twice, 49.3% of the original 2,011 cited resources could not be located at the cited URL. The older the article, the more likely that URLs in the reference list of that article were inactive (r = -0.62, P<0.001, n = 1,968). There was no difference in availability across resource types (χ^sup 2^ = 5.28, df = 2, P = 0.07, n = 1,786). Whether an URL was active varied by top-level domain (χ^sup 2^ = 14.92, df = 4, P = 0.00, n = 1,786).

Conclusions: URL decay is a serious problem in health care management journals. In addition to using website archiving tools like WebCite, publishers should require authors to both keep copies of Internet-based information they used and deposit copies of data with the publishers.


Article citations serve many purposes. Writers use references to credit other authors' ideas. Citation analysis is used to study trends in a particular field. Researchers use references to find original or additional sources of information.

Locating cited Internet-based resources can be difficult because the original documents may have been removed from the web or their content may have been revised or altered. Other Internet resources may still exist, but their addresses - uniform resource locators (URLs) - may have changed, rendering cited URLs obsolete. Additional resources may be hosted behind members-only interfaces, where they may be impossible or expensive to obtain. Koehler believes that because of these characteristics, "web documents are not the same thing as published and immutable works. Nor do they disappear the very moment they are uttered or broadcast. The WWW represents a third model that coexists between the recorded and the unrecorded." He continues, "Because it is a new medium, we have not yet fully identified the dynamics of its behavior" [1].


A number of studies exist of resource inaccessibility at cited URLs, known variously as URL decay [2] or link rot [3]. Koehler produced three now-classic longitudinal studies of a sample of web pages [1, 4, 5] and Bar-Ilan and Peritz examined informatics web pages [6]. Examples of other studies include, but are not limited to, examinations of print and online bibliographies of Internet pages [3, 7, 8], undergraduate student papers [9-12], conference papers [13, 14], online public access catalogs (OPACs) [15], and MEDLINE citations [16-18]. Many researchers have studied references in scholarly journal articles. Fields examined include, but are not limited to, biomedicine [2, 19-26], biomedical informatics [27], business [28], communications [29, 30], computer science [31], ecology [32], law [33], and library and information science [34-38]. Another set of articles looks at trends in journals in several fields [39-43].

These studies, which used varying methodologies and timeframes, reported widely differing percentages of found URLs. Sellitto finds that 96% of citations in conference papers were available within a year of publication, for the highest success rate [13]. Tyler and McNeil, who examined website bibliographies, reported the lowest rate of successful access, finding only 20% of URLs 7 years after publication [3]. Among studies of scholarly journal citations, Zhang reported the highest percentage of found URLs, locating 69% after 1 year [38]. Thorp and Brown found the lowest percentage, locating 39% of citations between 1 and 6 years old [25]. …