Quantitative Analysis of Literary Styles. (General)

By Peng, Roger D.; Hengartner, Nicolas W. | The American Statistician, August 2002 | Go to article overview

Quantitative Analysis of Literary Styles. (General)


Peng, Roger D., Hengartner, Nicolas W., The American Statistician


1. INTRODUCTION

It is often recognized that authors have inherent literary styles which serve as "fingerprints" for their written works. Thus, in principle, one should be able to determine the authorship of unsigned manuscripts by carefully analyzing the style of the text. The difficulty lies in characterizing the style of each author, that is, determining which sets of features in a text most accurately summarize an author's style. When doing a quantitative or statistical analysis of literary style, the problem is finding adequate numerical representations of an author's inherent style.

Quantitative literary style analysis presents a unique opportunity to introduce and motivate many standard multivariate techniques. It is possible to view each text as a collection of multivariate observations, in which case we are immediately faced with the inherent difficulties of analyzing high-dimensional data. The usual questions are relevant: How can we visualize the data? What are the significant features? Are there any interesting structures? In this situation we also have the benefit of being able to rely on some immediate knowledge of the subject matter to analyze and understand the data. Traditional multivariate methods can then be used to contrast and compare the styles of several authors and possibly assign authorship.

1.1 Previous Work

There has been much work covering different aspects of this field. For a comprehensive review we direct the reader to Holmes (1985). Many early attempts to quantify style relied on concordances, or inventories of the frequency of every word in a text. In 1901, T. C. Mendenhall reduced the concordances of Shakespeare and Bacon to distributions of word lengths and plotted these distributions as graphs. His so called "characteristic curves" serve as an early example of the use of graphics in distinguishing authorship. Mendenhall examined the differences in the shapes of the curves (such as the location of the mode) and suggested that it was unlikely that Bacon wrote any of Shakespeare's disputed works. However, C. B. Williams reproduced some of Mendenhall's curves and noted that Mendenhall's conclusions may have been too strong. In fact, there was little evidence for or against the theory that some works written by Shakespeare could have been written by Bacon (Williams 1975). Brinegar (1963) also used word lengt h distributions to determine if Mark Twain had written the Quintus Curtius Snodgrass (QCS) letters. He used [chi square] tests and two-sample t-tests on the counts of 2, 3, and 4 letter words to check the agreement of the QCS letters with Twain's known writings. Thisted and Efron (1987) used the idea of vocabulary richness to determine the possibility of Shakespearean authorship of a newly discovered poem. They based their analysis of the poem on the rate of "discovery" of new words given the number of distinct words previously observed in the Shakespearean canon. Holmes (1992), in an example of the use of a standard multivariate analysis technique, used hierarchical cluster analysis to detect changes in authorship in Mormon scripture. He also used various measures of vocabulary richness to conduct his analysis.

There is no general agreement on the unit of analysis that should be used in authorship studies. In the previously mentioned examples, word length and vocabulary richness were the units used. Williams (1940) analyzed the sentence lengths of works written by Chesterton, Wells, and Shaw. He noticed that the log of the number of words per sentence appeared to follow a normal distribution. Morton (1965) also used sentence length in his analysis of ancient Greek texts. After initially using criteria such as word length and sentence length, Mosteller and Wallace (1963) focused on using function word counts to discriminate between the works of Hamilton and Madison in their seminal analysis of the Federalist Papers (see also Mosteller and Wallace 1964).

The rest of this article is only available to active members of Questia

Sign up now for a free, 1-day trial and receive full access to:

  • Questia's entire collection
  • Automatic bibliography creation
  • More helpful research tools like notes, citations, and highlights
  • Ad-free environment

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

Quantitative Analysis of Literary Styles. (General)
Settings

Settings

Typeface
Text size Smaller Larger
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Full screen

matching results for page

Cited passage

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited passage

Welcome to the new Questia Reader

The Questia Reader has been updated to provide you with an even better online reading experience.  It is now 100% Responsive, which means you can read our books and articles on any sized device you wish.  All of your favorite tools like notes, highlights, and citations are still here, but the way you select text has been updated to be easier to use, especially on touchscreen devices.  Here's how:

1. Click or tap the first word you want to select.
2. Click or tap the last word you want to select.

OK, got it!

Thanks for trying Questia!

Please continue trying out our research tools, but please note, full functionality is available only to our active members.

Your work will be lost once you leave this Web page.

For full access in an ad-free environment, sign up now for a FREE, 1-day trial.

Already a member? Log in now.