In 1991 Princeton and Rutgers universities announced the establishment of the Center for Electronic Texts in the Humanities (CETH), a collaborative project between the state university and the major private university in New Jersey. CETH was to provide a national focus for those involved in the creation, dissemination, and use of electronic texts in the humanities.

The concept for the Center grew from nearly eight years of work spent at Rutgers University Libraries in developing the Inventory of Machine-Readable Texts in the Humanities and the needs that emerged from that experience. The Inventory, begun in 1983 with funding from the Council on Library Resources, provides an online catalog of electronic texts that can be of potential use to researchers in various humanities disciplines. It is maintained on RLIN, the online network of the Research Libraries Group, and is available internationally. By locating texts through the Inventory a researcher can save considerable time because the work of transcribing and encoding a printed text into an electronic format was, and still is, time-consuming and tedious, detracting from research time that could be spent using the text. The Inventory permits large-scale resources to be shared by many who have differing theoretical objectives and are working in diverse computing environments. With the exception of the Oxford Text Archive, begun in 1976, there have been few systematic efforts to make existing electronic texts available for other scholars to use.

Earlier compilations of texts were done in an ad hoc fashion for individual or group research projects or were the by-products of a concordance, dictionary, or critical edition publication. There were no recognized procedures for providing access to the texts by others or maintaining them for the long term. There are few, if any, commercially published texts. As a result there were no published bibliographies to document the existence of the texts, as they were in the hands of a few individuals or research institutes. CETH estimates that approximately 95 percent of existing texts are in this uncataloged form.

When sources of texts were identified and compilers asked to send documentation from which a catalog record could be created, several other challenges arose. Most projects were not staffed adequately to provide the desired information. Both large and small projects did not foresee their data being used elsewhere and had not documented their texts extensively. In many cases, when the inventory survey data were received, the information was incomplete or difficult to use. Reference to a source text upon which the electronic version was based was often missing and the encoding practices varied widely, usually depending upon the focus of the particular research project. It became increasingly clear that the reusability of these texts would depend heavily on the documentation of their quality and that standards for encoding and interchange were desperately needed.

After gathering data for the Inventory and after discussions with humanities scholars, other barriers toward advancing scholarship in the humanities through the use of high-quality electronic texts emerged: difficulty in locating electronic texts that could be used or adapted for use in research and/or teaching; a lack of standards for text encoding and interchange that would allow high-quality texts to be produced and shared by others who might or might not be using the same software; the need for better software tools that would advance methodologies in humanities computing beyond those in use for the last thirty or so years; and the lack of educational programs for those interested in developing skills in using computing for humanities research. …

