Academic journal article Journal of Biblical Literature

Computerized Source Criticism of Biblical Texts

Academic journal article Journal of Biblical Literature

Computerized Source Criticism of Biblical Texts

Article excerpt

We have developed an automated method to separate biblical texts according to author or scribal school. At the core of this new approach is the identification of correlations in word preference that are then used to quantify stylistic similarity between sections. In so doing, our method ignores literary features-such as possible repetitions, narrative breaks, and contradictions-and focuses on the least subjective criterion employed by Bible scholars to identify signs of composition. The computerized system is unique in its ability to consider subtle stylistic preferences in aggregate, whereas human scholars are generally limited to cases where a word preference is pronounced. Our method is also less liable to accusations of bias, thanks to its reliance on context-independent criteria. Its efficacy is demonstrated in its successful deconstruction of an artificial book, Jer-iel, made up of randomly interleaved snippets from Jeremiah and Ezekiel. When applied to Genesis-Numbers, the method divides the text into constituents that correlate closely with common notions of "Priestly" and "non-Priestly" material. No such corroboration is forthcoming for the classic Yahwistic/Elohistic division.

(ProQuest: ... denotes non-US-ASCII text omitted.)

"We instruct the computer to ignore what we call grammatical words-articles, prepositions, pronouns, modal verbs, which have a high frequency rating in all discourse. Then we get to the real nitty-gritty, what we call the lexical words, the words that carry a distinctive semantic content. Words like love or dark or heart or God. Let's see." So he taps away on the keyboard and instantly my favourite word appears on the screen.

- David Lodge, Small World (1984)

In this article, we introduce a novel computerized method for source analysis of biblical texts. The matter of the Pentateuch's composition has been the subject of some controversy in modern times. From the late nineteenth century until recent years, the Documentary Hypothesis was the most prevalent model among Bible scholars. Since then, scholars have increasingly called into question the existence of some or all of the postulated documents. Many prefer a supplementary model to a documentary one, while others believe the text to be an amalgam of numerous fragments. The closest thing to a consensus today-and it too has its detractors-is that there exists a certain meaningful dichotomy between Priestly (P) and non-Priestly texts.1

The various source analyses that have been proposed to date are based on a combination of literary, historical, and linguistic evidence. Our research is a first attempt to put source analysis on as empirical a footing as possible by marshaling the most recent methods in computational linguistics. The strength of this approach lies in its "robotic" objectivity and rigor. Its weakness is that it is limited to certain linguistic features and does not take into account any literary or historical considerations.

Though this work does not address the question of editorial model, we do hope it might contribute to the fundamental issue of literary origins. For cases in which scholars have an idea how many primary components are present, our new algorithmic method can disentangle the text with a high degree of confidence.

The method is a variation on one traditionally employed by biblical scholars, namely, word preference. Synonym choice can be useful in identifying schools of authors, as well as individuals. Occurrences in the text of any one word from a set of synonyms are, however, relatively infrequent. Therefore, synonyms are useful for teasing out some textual units, but not all. Accordingly, we use a two-stage process. We first find a reliable partial source division based on synonym usage. (Only a preference of one term over its alternative is registered; the context in which it is used is ignored.) In the second stage, we analyze this initial division for more general lexical preferences and extrapolate from these to obtain a more complete and fine-grained source division. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.