Extensions of the Coalescent Effective Population Size
Wakeley, John, Sargsyan, Ori, Genetics
We suggest two extensions of the coalescent effective population size of SJÖDIN et al. (2005) and make a third, practical point. First, to bolster its relevance to data and allow comparisons between models, the coalescent effective size should be recast as a kind of mutation effective size. Second, the requirement that the coalescent effective population size must depend linearly on the actual population size should be lifted. Third, even if the coalescent effective population size does not exist in the mathematical sense, it may be difficult to reject Kingman's coalescent using genetic data.
(ProQuest: ... denotes formulae omitted.)
MODERN population genetics is data driven and yet relies on modeling to capture the long-term interaction of forces shaping genetic variation. Data are interpreted by comparing observed patterns of variation to the predictions of mathematical models. Minimally, these models incorporate mutation and random genetic drift, but often include other factors, such as population structure and natural selection. The standard neutral coalescent process (KINGMAN 1982; HUDSON 1983; TAJIMA 1983), also known as Kingman's coalescent, is the accepted null model for the initial interpretation of data. For this reason, SJÖDIN et al. (2005) argued that Kingman's coalescent is a more relevant idealizedmodel for discussions of effective population size than the traditional Wright-Fisher model (FISHER 1930; WRIGHT 1931).
The idea of effective population size is to map a given population onto a simpler well-known model of a population. The effective size of a population is often defined loosely as the corresponding size of a Wright-Fisher population that would have the same "rate of genetic drift." Several different definitions of effective population size have been proposed on the basis of single measures of the rate of genetic drift or single measures of polymorphism, such as heterozygosity (CROW and KIMURA 1970; EWENS 1982, 1989). As SJÖDIN et al. (2005) point out, an effective size based on convergence to Kingman's coalescent is preferable because its existence implies that all aspects of genetic variation should conform to the predictions of Kingman's coalescent, meaning that any statistical test applied to data should reject the model only at the nominal level.
A coalescent effective size is also preferable because Kingman's coalescent has been shown to hold for a surprisingly wide variety of population models (KINGMAN 1982; MÖHLE 1998; NORDBORG and KRONE 2002), including the Wright-Fisher model and many others. In short, the complicated details of many populations disappear in the limit as the population size N tends to infinity, with time rescaled appropriately, so that the ancestry of a sample is determined by a very simple process. Each pair of lineages ancestral to the sample coalesces independently with rate 1 and each single lineage experiences mutations independently with rate θ/2. Note that defining an effective population size N^sub e^ in this context means we are interested only in its value or behavior asymptotically as the population size N tends to infinity.
We include mutation in "Kingman's coalescent" and argue that this is crucial because, without mutation, Kingman's coalescent (or any other model) cannot make predictions about genetic variation. The mutation parameter is defined as θ = 2N^sub e^µ for haploids and θ = 4N^sub e^µ for diploids, where µ is the mutation probability during meiosis at a locus under study. In cases where the complicated details of a population collapse to Kingman's coalescent as N [arrow right] &infin, we advocate calling this N^sub e^ in θ the coalescent effective population size. This can be seen as a type of mutation effective size (EWENS 1989),which differs from previous definitions (MARUYAMA and KIMURA 1980; WHITLOCK and BARTON 1997; CHARLESWORTH 2001; PANNELL 2003) in that it applies to the parameter of the entire ancestral process, with its manifold predictions about data, rather than just to single measures of variation such as the heterozygosity of the population. …