The Design of a Relational Database for Large-Scale Bibliographic Retrieval

Article excerpt

A fully normalized relational bibliographic database promises relief from the update, insertion, and deletion anomalies that plague bibliographic databases using (US) MARC formats internally. The conceptual design of a full-scale bibliographic database (including bibliographic, authority, holdings, and classification data) is presented, based on entity-relationship modeling. This design translates easily into a logical relational design. The treatment of format integration and the differentiation between the intellectual and bibliographic levels of description and between collective and individual levels of description are discussed. Unfortunately, the complexities of bibliographic data result in a tension between the semantic integrity of the relational approach and the inefficiencies of normalization and decomposition. Compromise approaches to the dilemma are outlined.

According to Date, "It is undeniable that the relational approach represents the dominant trend in the marketplace today, and that the `relational model' . . . is the single most important development in the entire history of the database field"(1) Despite the popularity of this data model in the larger database world, large-scale bibliographic databases have typically retained data structures based on the complex, but unitary MARC record. While some degree of relationality is implied--for example, in the use of the USMARC bibliographic, authority, holdings, and classification formats-only limited direct interrecord linkage occurs;(2) the bibliographic database world is still considerably more record-oriented than relation-oriented. (Reference to MARC in this article is often USMARC-specific. Nevertheless, the larger conclusions presented here should hold also for other MARC implementations.)

Given the general effectiveness of relational databases, it is not surprising that the idea of redesigning bibliographic databases according to relational database principles has cropped up from time to time in the literature.(3) Starting from a similar premise, a seminar conducted in the College of Library and Information Services (CLIS) at the University of Maryland in fall 1994(4) undertook to establish the basic logical design of a large-scale bibliographic database using the entity-relationship (ER) model, with a view to the eventual conversion of the ER-based conceptual schemes into a relational database. The results of that undertaking are reported and discussed in this article.

Background

MARC and Database Design

In the bibliographic world, the MARC family constitutes an unrivaled standard. Designed for the transfer of bibliographic data in machine-readable form, specifically on magnetic tape, the MARC formats are, first and foremost, communication formats. For lack of suitable alternatives, they have also been used as storage formats. In the standard three-level database architecture,(5) which distinguishes among internal schemes (physical data storage structures), conceptual schemes (logical, community-wide views of the data), and external schemes (user views of the data, especially in terms of output reports or screen displays), the MARC formats generally correspond most closely to external schemes.

This three-level database architecture supports the ideal of data independence, the capacity to make changes in one level or schema of the database without having to replicate changes in other levels of the database.(6) On the one hand, the use of MARC records as a communications format on the input and / or output side dictates neither the logical view of a bibliographic database nor its internal storage structure. On the other hand, the internal storage structure must be MARC-compatible, so that data coming in from a MARC record can be transformed to be consistent with data already in the database and data in the database can be transformed into a MARC record for output; Llorens and Trenor introduce just such a system. …