Academic journal article Library Resources & Technical Services

Using Automation and Batch Processing to Remediate Duplicate Series Data in a Shared Bibliographic Catalog

Academic journal article Library Resources & Technical Services

Using Automation and Batch Processing to Remediate Duplicate Series Data in a Shared Bibliographic Catalog

Article excerpt

Along with accuracy and comprehensiveness, consistency in cataloging practice improves discovery and identification of resources. Conversely, varying cataloging practice, whether due to local needs or changes to national standards, can result in inconsistent data within a shared bibliographic catalog. The consolidation of bibliographic databases in library consortia may exacerbate these inconsistencies. To maintain metadata quality and update older data to newer standards, catalogers can build on their traditional knowledge and also use data analysis, scripting, and batch manipulation when performing large-scale remediation.

The authors are catalogers at institutions comprising the State University Libraries (SUL) of Florida. As members of the Bibliographic Control and Discovery Subcommittee of the Council of State University Libraries, they formed the Multiple-Series Cleanup Task Force. The Task Force members were chosen due to their complementary skill sets. Two of the members have extensive experience and training in cataloging practice while the third had substantial experience with databases and systems technology before a career in librarianship. One of the members had experience developing Python scripts as a content systems analyst at a financial information provider. Another member has experience with developing XSLT and JavaScript programs. Although these tools were not used for this remediation project, experience with programing language provided a conceptual understanding that assisted with interpreting the Python scripts. All the members had varying experience with data analysis, batch processing, and batch loading as part of their assignments. To aid in these efforts, they independently learned to use MarcEdit through trial and error, webinars, and from peers. Similarly, they also learned how to take advantage of Excel's powerful data analysis tools.

The Task Force was charged with creating a plan to remediate duplicate series data that were causing issues in the catalog's discovery tool. To fulfill its charge, the Task Force identified records in SUL's shared bibliographic database that included obsolete and duplicate series fields that caused display problems.

The Task Force first analyzed the records using MarcEdit and Excel, and then developed a Python script to compare a subset of the records in the shared bibliographic database of the SULs-known as the Shared Bib-with their corresponding OCLC master records. They ultimately updated the problematic Shared Bib records using a locally developed batch-loading tool. The application of these automation tools saved a significant amount of time rather than manually updating each record. The workflows and processes used for this project serve as an example for how catalogers can approach future remediation projects in an efficient and effective manner.

Literature Review

How best to incorporate quality bibliographic description into a library's catalog has been a topic of discussion in literature for decades. (1) In 2008, Cataloging and Classification Quarterly devoted an entire special issue to the topic. (2) High-quality bibliographic description is generally defined as accurate, usable, complete, and consistent. (3) These components are needed for a positive impact on the user experience. Petrucciani writes about the need for consistency and accuracy as prerequisites for establishing trust among the users that the catalog will provide "clear and effective navigation functions among controlled bibliographic entities." (4) Dunsire states, "The efficiency and effectiveness of any information retrieval service requires coherency and consistency in metadata." (5) Harmon acknowledges the direct relationship between the presence of information in the bibliographic record and the library users' retrieval of that record in the discovery interface, and asserts that it is the cataloger's responsibility to support the organization's public service mission in providing access to research materials. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.