Small-Town Newspapers Online: Smaller Newspapers Have Been Slower to Jump on the Online Bandwagon, Largely Due to the High Cost of Converting Pages to Digital Format and Extracting Text with OCR Software

Article excerpt

FOR several years now, students, writers, researchers, librarians, and others with a need, or simply a desire, to look into the past have enjoyed increasing access to both contemporary and archival editions of the country's largest newspapers. Some, like the Dallas Morning News, with a historical archive hosted by NewsBank covering 1885-1977 [], and the Augusta (Ga.) Journal, whose archive goes from 1821 to 2000 [], have financed their own transition to digital versions. Others, including The New York Times and The Washington Post, have teamed with commercial database provider ProQuest Historical Newspapers [], which digitizes and OCRs newspaper pages from microfilmed images. In all cases, the results are made available via the Internet to universities, libraries, and individuals on a fee basis.

Smaller newspapers have been slower to jump on the online bandwagon, however, largely due to the high cost of converting pages to digital format and extracting text with OCR software. Add to that the cost of online storage and the software necessary to make papers accessible via the Internet, and many smaller papers are out of the running before they begin.

Some small newspapers have found their way to the Web with the help of services such as NewspaperARCHIVE [], created by Heritage Microfilm, and Paper of Record [www.paper], produced by Cold North Wind. These types of services put papers online at no cost to the publishers and then charge the public a fee for access. NewspaperARCHIVE claims 27.8 million pages from 1,563 titles and charges $49.95 per year or $6.95 per month for access to the full text. It has some newspapers from outside the U.S. (Canada, U.K., Ireland, Denmark, Jamaica, South Africa, and the U.S. Virgin Islands), but coverage can be very sporadic. For example, there is only one newspaper from Denmark, Politiken, and it has only 5 years (1884, 1936, 1982, 1992, and 2002) available. Even those years aren't complete. For example, for all of 1982, NewspaperARCHIVE contains only three pages from 1 day. Likewise the only South African newspaper in the database, the Sunday Times, is represented by 27 dates in 1945, with only a few pages from each issue.

Paper of Record says it has about 8.5 million pages digitized from newspapers published in 16 countries. Some titles are represented by only a few years, however. Paper of Record costs $99.99 per year or $16.75 per month. It is strongest in Canadian titles, which is not surprising since Cold North Wind is a Canadian publisher. It also includes a number of Mexican newspapers. Its U.S. coverage is weak. Click on Florida, for example. All that's there is the Deland Beacon with 11 pages from 2004.


Aside from the spotty coverage, both in terms of geographic spread and completeness, there's another catch: Like ProQuest, both NewspaperARCHIVE and Paper of Record work almost exclusively with automated systems that scan and OCR text from microfilmed images. Neither one usually works from original newspaper pages.

Unfortunately, this leaves out several hundred small-town newspapers that have never put their archives on microfilm because the cost was prohibitive. Most simply stored their back issues in stacks or boxes in back rooms, or archived them in bound volumes, which at least slowed the rate of deterioration. (All newsprint--even if it's stored in a temperature controlled room--is subject to deterioration. Natural chemical reactions between acids and other components of cheap pulp paper will eventually turn the paper to dust--and newsprint has always traditionally been the cheapest paper available.)

Some publications have incomplete microfilm archives, done in fits and starts over the years. Even those publishers with full archives can be reluctant to commit their papers to being digitized from microfilm or microfiche due to quality considerations. …


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.