Last month, in part one of this series (April, Information Today, page 41), we described the major decision points that should be addressed before establishing a database. In this article, we will address more specific issues.
For the sake of clarity, it may be useful to begin by defining some database associated terms.
* Record--The basic unit of information, usually concerning one article, one person, or one transaction.
* File--A collection of records containing related information such as articles in a given subject area, resumes, or payroll records.
* Database--A structured collection of files. In the book Database Design for Information Retrieval (Wiley, 1987), Raya Fidel defines a database as "a single, central store of data with multiple usages."
* Field--The structure of a record is determined by the way data are broken down into specific elements such as a title or an author field. These may, in turn, be broken down into subfields such as last name, first name, and initial. In an address field the city and state could be subfields.
In larger organizations, or where multiple database exist, a Data Dictionary is usually built to define various data elements and their use. This allows for consistency across databases and throughout the organization.
* As mentioned in the previous article, the why and what type questions should be answered early on. Given satisfactory answers to such questions, the design process can now begin.
The structure of the database, the way the data elements are organized, is the initial design consideration. Only after fields, subfields, and the resultant records are laid out can input screens and output formats be created. It is often useful, however, to consider output or reporting requirements early in the design process to be sure that all of the data elements needed are actually included in the database and organized in a useful manner. At the same time the scope of the database must be addressed. Will it consist of only the currently available materials? Will an active acquisition program be mounted in order to continue acquiring similar data? What about the backlog? Is there a considerable number of retrospective documents? Will they all be input, or only a certain number acquired during a twelve month period?
The search logic to be used must be developed. This is also a function of the structure. A request to search the database by author would be meaningless, for example, if the field that contains the names included a mix of authors, editors, cited individuals, illustrators and publishers. However, it is possible to create generalized fields, such as name file, without regard to other elements. In this instance, to be useful the name field and the subfields should have a logical relationship to other data elements defined as author or illustrator, etc.
In any given database, some fields will be mandatory and others optional. This is an important consideration in setting up data entry screens and in developing the search strategies to be used.
Related issues revolve around which fields are designed to be searchable, what fields can be sorted (in what sequence), and whether numerical data will be used in calculation or ranges. Many databases allow searching in date ranges, i.e., from 1/1/85 to 1/1/92. To be able to sort dates in ranges requires that a consistent pattern be used in entering the dates. The date field/subfields are often set up so that data can only be entered in the correct format. Computers lend themselves to this type of editing, allowing the designer to set up masks or templates for data entry which check for simple errors like putting alphabetic characters in numeric fields or the reverse. Authority files or tables can also be built which serve as a screen or filter against which input data can be validated before its acceptance into the database. …