Academic journal article Journal of American Folklore

Inferring Propp's Functions from Semantically Annotated Text

Academic journal article Journal of American Folklore

Inferring Propp's Functions from Semantically Annotated Text

Article excerpt


Vladimir Propp's Morphology of the Folktale, published in 1928 and first translated into english in 1958 (Propp [1928] 1968), is a seminal work in folkloristics, having ushered in an era of structuralism, provided a template for later studies of the narrative structure of folklore, and inspired generations of folklorists. one of the most precise formulations of narrative structure to date, Propp's morphology presents a compelling subject of machine learning. it would be of wide-ranging interest if a morphology could be automatically and reliably extracted from a given set of folktales. For folklorists and literary theorists, such a tool would be invaluable for comparison, indexing, and classification. For cultural anthropologists, it would provide a new technique for studying culture and its variations across time and space. For cultural psychologists, it would point the way to new experiments for investigating culture and its impact on thought. For cognitive scientists, it would serve as a model of understanding abstractions from texts, and the nature of narrative understanding. For computational linguists, it would be a step toward understanding the higher-level meaning of natural language. And for researchers in artificial intelligence and machine learning, it would represent an advance in our ability to extract deep structure from complex datasets. each of these fields would naturally also find advances in the others of interest.

Unfortunately, the extraction of morphologies has until now remained a manual task, the purview of scholars such as A. J. Greimas, claude lévi-Strauss, Alan Dundes, and, of course, Vladimir Propp. constructing a morphology for a particular set of folktales takes many years of reading and analysis. it is unclear how much the morphology, once complete, owes to the folklorist's personal biases or familiarity with other extant morphologies, rather than being a true reflection of the character of the tales under investigation. Furthermore, blind reproduction or validation of a morphological analysis is a prohibitively difficult endeavor, requiring a scholar with the necessary skills to retrace the years-long paths of reading, analysis, and synthesis required to generate a morphology by hand.

I demonstrate a technique that gives computational purchase on the problem of identifying a morphology from a given set of stories. The algorithm is a modification of a machine learning technique called model merging (Stolcke and omohundro 1994), and uses a set of rules derived from Propp's descriptions of his own process for finding similarities between tales. in this technique, the algorithm runs over semantically annotated texts as data, folktales whose surface semantics have been encoded in a computer-readable representation. For this particular demonstration, the data are a selection of single-move russian fairy tales analyzed by Propp, and translated into english. importantly, the encoding of the surface semantics of the texts is humanassisted; the actual learning of the identities of Propp's function is done by computer.

The paper is organized as follows: First, i explain the machine learning problem at hand, pointing out those parts of Propp's theory that i will target for learning. Second, i describe the structure of the learning technique used, and how it differs from regular model merging. Third, i describe the data used in the experiment, including the texts, the semantic annotation schemes, and the gold standard data (Propp's analyses) against which the performance of the algorithm was measured. Fourth, i lay out the set of merge rules, derived from Propp's descriptions, that work within the model-merging framework to reproduce a substantial portion of Propp's functions. Finally, i describe the performance of the algorithm in extracting the identities of Propp's functions.

Learning Target

Propp's morphology comprises a set of character categories plus three levels of plot structure: gross structure (moves), intermediate structure (functions), and fine structure (what i will here call subtypes: Propp himself had no specific term for them). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.