Overdispersion of the Molecular Clock Varies between Yeast, Drosophila and Mammals
Bedford, Trevor, Wapinski, Ilan, Hartl, Daniel L., Genetics
Although protein evolution can be approximated as a "molecular evolutionary clock," it is well known that sequence change departs from a clock-like Poisson expectation. Through studying the deviations from a molecular clock, insight can be gained into the forces shaping evolution at the level of proteins. Generally, substitution patterns that show greater variance than the Poisson expectation are said to be "overdispersed." Overdispersion of sequence change may result from temporal variation in the rate at which amino acid substitutions occur on a phylogeny. By comparing the genomes of four species of yeast, five species of Drosophila, and five species of mammals, we show that the extent of overdispersion shows a strong negative correlation with the effective population size of these organisms. Yeast proteins show very little overdispersion, while mammalian proteins show substantial overdispersion. Additionally, X-linked genes, which have reduced effective population size, have gene products that show increased overdispersion in both Drosophila and mammals. Our research suggests that mutational robustness is more pervasive in organisms with large population sizes and that robustness acts to stabilize the molecular evolutionary clock of sequence change.
(ProQuest: ... denotes formulae omitted.)
PROTEIN sequence divergence is often approximated as a "molecular evolutionary clock" (ZUCKERKANDL and PAULING 1965), where the accumulation of amino acid substitutions is proportional to the time separating the sequences. In the absence of temporal variation, the distribution of substitution counts across a protein's phylogeny is expected to follow a Poisson distribution, where both the mean and the variance of substitution counts are equal to the rate (intensity) parameter λ (OHTA and KIMURA 1971). As the mean and variance of the Poisson distribution are both equal to λ, substitution counts should show a ratio of the variance to the mean, known as the index of dispersion [R(t)], of 1. However, temporal variation in the rate of substitution influences the statistical character of substitution counts occurring over time. If substitution rate varies over time, then substitution counts of evolving proteins are expected to be "overdispersed" with R(t) > 1 (CUTLER 2000). It is now abundantly clear that the accumulation of amino acid sequence change in both mammals (GILLESPIE 1989; SMITH and EYRE- WALKER 2003) and Drosophila (ZENG et al. 1998; KERN et al. 2004; BEDFORD and HARTL 2008) is overdispersed. Additionally, the index of dispersion shows a linear correlation with the mean per-branch substitution count (M) in Drosophila, suggesting that substitution counts are better described by a negative binomial distribution rather than a Poisson distribution (BEDFORD and HARTL 2008). Such a negative binomial distribution is consistent with rate variation occurring over time across individual protein phylogenies.
Although, historically, the index of dispersion has been used as a test of the neutral theory (OHTA and KIMURA 1971; GILLESPIE 1989), findings of R(t) > 1 do not necessarily imply evidence of selection. Simple models of adaptive evolution suggest that substitutions fixed through positive selection may themselves be Poisson distributed. Additionally, more complex models of neutral evolution incorporating epistasis suggest that purely neutral substitutions may show significant overdispersion. Thus, the index of dispersion represents a test of the extent of heterogeneity of sequence evolution rather than a test of the selective forces at work.
There have been multiple studies of the index of dispersion of sequence evolution using lattice protein simulations (BASTOLLA et al. 2000; WILKE 2004; BLOOM et al. 2007a). Although lattice proteinmodels are heavily abstracted from the real proteins they seek to emulate, they do incorporate some important details of protein evolution. For instance, such lattice models give rise to a many-to-one mapping of genotypes to phenotypes, in which multiple sequences result in the same structure. …