Academic journal article Genetics

Estimating the Genomewide Rate of Adaptive Protein Evolution in Drosophila

Academic journal article Genetics

Estimating the Genomewide Rate of Adaptive Protein Evolution in Drosophila

Article excerpt


When polymorphism and divergence data are available for multiple loci, extended forms of the McDonald-Kreitman test can be used to estimate the average proportion of the amino acid divergence due to adaptive evolution-a statistic denoted .... But such tests are subject to many biases. Most serious is the possibility that high estimates of ... reflect demographic changes rather than adaptive substitution. Testing for between-locus variation in α is one possible way of distinguishing between demography and selection. However, such tests have yielded contradictory results, and their efficacy is unclear. Estimates of ... from the same model organisms have also varied widely. This study clarifies the reasons for these discrepancies, identifying several method-specific biases in widely used estimators and assessing the power of the methods. As part of this process, a new maximum-likelihood estimator is introduced. This estimator is applied to a newly compiled data set of 115 genes from Drosophila simulans, each with each orthologs from D. melanogaster and D. yakuba. In this way, it is estimated that ..., a value that does not vary substantially between different loci or over different periods of divergence. The implications of these results are discussed.

(ProQuest Information and Learning: ... denotes formulae omitted.)

THE McDonald-Kreitman test (McDoNALD and KREITMAN 1991; KREITMAN and AKASHI 1995) is an important technique for quantifying the contribution of positive Darwinian selection to molecular evolution. The test compares levels of polymorphism within a species to measures of divergence between species and relies on the assumption that a certain class of mutations can be treated as effectively neutral, a priori. Following McDonald and Kreitman, most studies have focused on protein-coding sequences and used synonymous mutations as their assumed-neutral referent. As such, the tests compare levels of synonymous polymorphism (P^sub s^) and divergence (D^sub s^) with their nonsynonymous (amino acid changing) equivalents (P^sub n^ and D^sub n^). The focus of many studies has been to estimate the proportion of the nonsynonymous divergence, D^sub n^, that was due to adaptive evolution, a statistic that is denoted α.

A serious problem with these tests is that levels of polymorphism are typically low in most population samples at most loci, especially if rare variants are excluded, and this means that single-locus estimates of α can be unreliable. To solve this problem, many methods of combining data from multiple loci have been introduced (FAY et al. 2001; BUSTAMANTE et al. 2002; SMITH and EYRE-WALKER 2002; SAWYER et al 2003; BIERNE and EYRE-WALKER 2004). Such methods can be used to estimate ..., the average value of α across the sampled loci.

However, it is now clear that different variants of the test have given different results when applied to data from the same model organism. Consider, for example, published results using polymorphism data from Dwsophila simulons. SMITH and EYRE-WALKER (2002) introduced a heuristic estimator of α that they applied to a data set of 35 loci. Measuring divergence from D. yakuba, they estimated that ... (i.e., that ~45% of the divergence between D. simulons and D. yakuba was driven by positive selection). In contrast, FAY et al. (2002) used their own earlier estimator (FAY et al. 2001) on the 23-locus data set of BEGUN (2001), with divergence measured from the common ancestor with D. melanogaster, they obtained an estimate of .... An even higher estimate was obtained by SAWYER et al. (2003), whose distinctive version of the test is set within a firm probabilistic framework (SAWYER and HARTL 1992; BUSTAMANTE et al. 2001). Using a set of 56 D. simulons loci, measuring divergence from D. melanogaster, they estimated that ^94% of the nonsynonymous divergence was adaptively driven. Finally, BIERNE and EYRE-WALKER (2004) introduced a maximum-likelihood estimator, which they applied to several data sets. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.