Corpora and sociolinguistic variation
This chapter considers how corpus linguists have tried to answer questions about the waysu in which different types of people use language. Such studies have often used spoken corpus data (although it is possible also to consider written corpora) where speakers have been annotated with reference to demographic variables such as sex, age and social class. Additionally, I take a first look at Biber’s influential multi-dimensional analysis approach, a method of identifying the main ways that various registers in a particular language differ from each other. Biber’s approach crops up at various points in later chapters, so it is useful to outline it here. I then consider studies using corpora that have been annotated with phonetic or prosodic information in order to describe or compare the language use of speakers of different dialects or ethnolects. The chapter also contains a warning about the dangers of over-interpreting simple frequencies and the need to provide an explanation for differences.
The variationist approach in sociolinguistics is typified by researchers like Labov (1966, 1972b), Cheshire (1982), Trudgill (1984) and Milroy and Milroy (1993). In general, the language use of one or more identity groups is charted by examining the presence (or non-presence) of particular linguistic variables. Such variables can be prosodic, phonetic, lexical, grammatical, discoursal or pragmatic. Written or spoken language production can be examined, although many sociolinguists have tended to focus on spoken language use. Language users are often divided into one or more discrete demographic categories based on the identities that they hold. For example, using sex as a variable, we could compare male speakers against female speakers. Many sociolinguistic studies attempt to take multiple variables into account, for example, categorising people according to combinations of sex, age, social class, occupation, geographic location, sexuality etc.
One approach that has been taken by some variationists is to elicit data. For example, in a famous study, Labov (1966) visited three Manhattan department