Magazine article Computers in Libraries

Developing Access to Electronic Texts in the Humanities

Magazine article Computers in Libraries

Developing Access to Electronic Texts in the Humanities

Article excerpt

Humanities, Full-text, SGML, Cataloging, Network

The Center for Electronic Texts in the Humanities (CETH) was established in 1991 by Rutgers and Princeton Universities to provide a national focus for those who are involved in the creation, dissemination, and use of electronic texts and resources in the humanities. These resources may be literary works, historical documents, manuscripts, papyri, inscriptions, transcriptions of spoken texts, or dictionaries, and may be in any natural language.

For the last thirty or so years, humanities scholars have used electronic texts for many scholarly applications (concordances, word-frequency indexes, analyses of style, the production of historical dictionaries, and more recently, even to model the narrative structures of literature), but we have almost no recognized procedures for providing access to the texts or for long-term maintenance.

Until now, in North America, there has been no concerted endeavor to bring the use of electronic texts into the center of the scholarly arena by building on existing resources and knowledge. Electronic texts need to be accessible to any scholar or institution through the library environment in a standard and recognized form, which will supplement and enhance the traditional modes of humanities scholarship.

CETH will take a leading role in formulating effective methodologies for developing, maintaining, and using electronic texts created by individual scholars or projects. It will achieve this objective by setting up a consortium of member institutions that will work together to establish a framework for advancing scholarship in the humanities by the use of high-quality electronic texts.

The term electronic text is used here to mean a searchable text which permits many more uses in humanities scholarship than the representation of text as an image, although eventually we believe that a combination of text and image format will present a more enhanced view of text. Electronic texts may be in plain text (often called ASCII) format, which can be manipulated by whatever software chosen, or they may be indexed for specific software and only usable with that software.

It is estimated that about 95 percent of existing texts are in plain text format, and are in the hands of individuals or research institutes and have been compiled for specific projects. The remainder represent mostly commercially available products, usually on CD-ROM, and are the ones most often now seen in libraries. Electronic texts become much more useful when additional information, such as author, title, chapter, or features such as quotations, proper names, and parts of speech are marked in some way.

There are now at least thirty different methods of encoding such features, but a new common format called the Text Encoding Initiative (TEI) is addressed is that many existing texts also suffer from inadequate documentation and unclear copyright situations.

Cataloging Electronic Texts

Since most electronic texts have been compiled for specific projects, it is not surprising that information about these texts is scarce. Those who have been responsible for compiling electronic texts have in general not had experience using bibliographic records for cataloging the texts. Since no other ground rules exist, they have developed their own ad hoc procedures for documenting the texts, or not provided any documentation at all.

The only attempt to create a systematic catalog using standard bibliographic procedures is the Rutgers Inventory of Machine-Readable Texts in the Humanities, which has now been taken over by CETH. Established in 1983, the Inventory is held on the Research Libraries Information Network (RLIN). It contained some 1,600 entries when CETH was established in 1991. About half of these are from the Oxford Text Archive, a collection built up at Oxford University Computing Service to prevent texts from becoming "lost" once their compilers have finished with them. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.