Magazine article Humanities

The American Language: A Historical Database of English in the U.S

Magazine article Humanities

The American Language: A Historical Database of English in the U.S

Article excerpt

While this bon mot has the advantage of being admirably succinct, it leaves something to be desired in terms of completeness. What other differences are there between a job and a career?

For one thing, the word job is considerably more common than career, as the former has been used by American writers more than twice as often over the past two hundred years. Additionally, jobs tend to be viewed as less desirable, no matter how many additional hours a career may require.

When one looks at the adjectives most commonly used to describe a job, the list includes dirty, lousy, and toughest. The corresponding set of adjectives used to describe a career includes glorious, illustrious, and distinguished. Careers are far more likely to be artistic (literary, operatic, or dramatic), whereupon jobs are more likely to be pedestrian (tedious, thankless, or steady).

One rarely hears of anyone having a steady career. While the descriptors affixed to jobs cover a wide range of ground, the words are much more frequently referring to some sort of necessary and ungratifying work, whereas careers appear to be viewed as much more fulfilling.

How do we know all this? Has a team of researchers been painstakingly keeping tabs on how people use these words ever since Thomas Jefferson was in the White House? Is it through some multiyear, large-scale survey taken of the writing habits of the American people? No, it is by taking sixty seconds or so to run a search on a publicly accessible website, the Corpus of Historical American English (COHA).

COHA was created in 2009 by Mark Davies, a linguistics professor at Brigham Young University, and it is the largest tagged and searchable corpus of historical English available today. Containing hundreds of millions of words spread out from the beginning of the nineteenth century to the end of the twentieth, it is evenly distributed between fiction and nonfiction (and each of these categories is drawn from a variety of genres), and is free to all, requiring nothing more than your inquiry. It allows linguistic researchers, scholars of other fields, and anybody who has more than a passing interest in language to discover subtleties about how we use words that would have been impossible to find until very recently. It is a marvelous trove of linguistic data, and shines light on thousands of aspects of that peculiar variety of language, American English.

Before we look at all that makes COHA unusual and interesting, we should first look at what a linguistic corpus is. Corpus, in Latin, simply means "body," a sense that is in large measure preserved in many English words that derive from it and are in use today. A corporation is a body of people united in a business sense, a corporal is a non-commissioned military officer who leads a body of troops, and someone who is corpulent has a large body. Hence, a linguistic corpus is just another kind of body: a body of language.

With a few exceptions, linguistic corpora are a relatively recent addition to the study of language. The first ones were of necessity small and limited in usefulness, as they were compiled by hand. In the late nineteenth century, the German psychologist William Preyer studied early language acquisition by creating a corpus of words that parents had written down when their children used them. In addition to various forms of mother and father, the words bird, sugar, and hair were apparently popular with German infants at this time. Also in the nineteenth century, the original editors of the Oxford English Dictionary relied on a somewhat corpus-based approach to their dictionary, as the bulk of that work is made up of millions of citations that were all originally written out on little slips of paper, which were then organized into thousands of pigeonholes, built into an enormous unheated iron shed, based on the word each citation was meant to illustrate.

Creating a corpus without a computer required an exhausting commitment of time and energy. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.