Academic journal article Information Technology and Libraries

Enrichment of Bibliographic Records on Online Catalogs through OCR and SGML Technology

Academic journal article Information Technology and Libraries

Enrichment of Bibliographic Records on Online Catalogs through OCR and SGML Technology

Article excerpt

This article presents the results of research on the feasibility of using scanner technology to capture contents pages of collective monographs, and to extract the bibliographic information of each individual work and process this with a standardized language, such as SGML, for tagging electronic documents. By this means, data can be used as electronic information or stored in an online catalog (OPAC), thus providing additional access points. A pilot system has been designed to test the initial hypotheses, show the feasibility of achieving the suggested goals, and develop the tasks required so that they may be carried out as automatically as possible.

Information Retrieval Systems (IRS) have undergone continuous evolution from the first designs (Ellis 1996). Nowadays, most of the features that once seemed esoteric to the lay user, such as queries in natural language, organization of retrieved results in accordance with their relevance, term weighting, query-by-example, and assistance in query formulation, have become standard and, in fact, essential in most information retrieval products (Frakes and Baeza-Yates 1992). One of the main consequences of implementing online catalogs was a significant increase in subject searching. Paradoxically, this also became the most problematic kind of search (Larson 1991). To solve these problems, research efforts have been directed to the three components of online catalog systems that determine subject searching: retrieval and search processing methods, user interface, and the database.

Advances related to the first two of these are being made at an acceptable pace (e.g., best match and nearest neighbor searches, hypertext connections, graphical user interface, etc.). However, the contents and structure of databases are still deficient, despite new technological possibilities (Larson et al. 1996).

To overcome this, one of the main improvements suggested is to increase record contents by incorporating more subject information. That is to say, to hypothesize that the bigger the (significant) possible volume of informative elements in the system is, the higher the possibilities of retrieval.

It is now obvious that, apart from trying to improve results, library systems must undergo a revision so as to cope with all kinds of material and electronic formats, as well as offering full access to internal--and external--information resources. It is not a question of going back to square one but, as Croft (1995) said referring to current lines of IR research, a question of reorienting efforts.

Based on the most significant current research, we can say that the reorientation of efforts must be directed to electronic publishing standards, communications standards, search mechanisms, and user interface systems. Therefore, the use of any tool to provide additional access points to the records of an online catalog would not only facilitate improved functionality, but also mean an advance in adapting to new developments. This research is part of an area of expanding research programs in an ever-changing environment.

Conceptual Approach and Background

One of the main conclusions drawn from our reading of the literature was that the databases of current online catalogs must offer additional access points. Most importantly, these should direct the user to articles in periodicals and to individual contributions to collective works (conference proceedings, collections, anthologies, etc.).

The problems of incorporating information that enables access to articles are decreasing, as publishers are starting to offer references in a standardized format. Furthermore, projects for automatic extraction of bibliographic information from abstracts in scientific journals, and its inclusion in online catalogs, using SGML (Standard Generalized Markup Language), have recently been carried out (Harrison et al. 1995).

To increase monograph subject content information within these catalogs, research has been carried out on the use of information from abstracts, indexes, and tables of contents (Van Orden 1990). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.