Academic journal article Information Technology and Libraries

Filtering out Noise Lines from OPAC Downloads with Sed

Academic journal article Information Technology and Libraries

Filtering out Noise Lines from OPAC Downloads with Sed

Article excerpt

Internet access to other libraries'online catalogs has provided patrons and librarians alike with rich bibliographic data beyond the home institution's online public access catalog (OPAC). Users who know that a particular institution has strong corrections in certain subject areas often can simply dial into that institution's OPAC and search through the holdings. Most communications software allow the user to download retrieved records to a file for adding to a bibliography or bibliographic database at a later time.

Of course, saving OPAC screens to a file will result in capturing many "noise lines," which form part of the source OPAC's user interface, in addition to the desired bibliographic information (as seen in the sample downloaded records shown in figure 1). One could import the file containing these and other records into a word processor and edit out the noise lines by hand or perhaps with the help of some macros. One could also write a program to strip out noise lines at the point of downloading, much as Hagee and Boewe have done for their institution's OPAC.(1)

Another solution would be to run the file of downloads through a text stream editor that can filter out noise lines. The stream editor sed performs this function well, and it is available to UNIX users on most workstations as well as to DOS users in the form of a public domain program or as part of the UNIX utility package for DOS, called MKS Toolkit.(2,3)

THE BASICS

The way sed works is simple: It matches on a text pattern and then performs some action on the pattern or on the line that contains the pattern. Suppose that in a text file titled "libcomp" that has used the term "DOS," the user wants to change the term "DOS" to,_"MS-DOS". The sed editor can do this easily with the following simple script: s/DOS/MS-DOS/ Example 1

The "s" at the beginning of the script stands for the "substitution" command. Patterns are always enclosed in "forward slashes" (/), with the pattern to be matched coming first and the pattern to substitute coming second. The syntax of this statement is then 's///'. An excerpt from the text file libcomp reads The Library uses PCs running DOS as terminals for its online catalog. It is important, then, for Library staff to have a working knowledge of DOS . . .

After going through the sed script the text reads The Library uses PCs running MS-DOS as terminals for its online catalog. It is important, then, for Library staff to have a working knowledge of MS-DOS. . .

Note that pattern matching in sed is always case-sensitive. The command |s/dos/MS-DOS/ would not result in any matches or substitutions in the passage cited above.

Many word processors have an easy-to-use search-and-replace feature, but even at this elementary level sed offers the advantage over the word processor in that the sed script operates on and outputs ASCII text; there is no need to import the text into a word processor, make the change, and then possibly have to export the text back into ASCII format.

How does the user execute a command like the one in example 1? As an executable program, the tool sed is always invoked on the command line in the UNIX or DOS shell, but the command script to be executed may be either on the command line or in a file that the sed command calls. The examples in this article assume that scripts will be placed in files, since this is the most practical method for anything other than a very short script. Saving sequences of commands as a script in a file also allows the user to go back and easily edit the script if it does not give the desired result. To issue example 1 on the command line the user would enter sed's/DOS/MS-DOS/'libcomp Example 2

The syntax of executing a sed command on the command line is then sed''

The command must always be enclosed in single quotes so that sed recognizes it as a command string and not as a filename. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.