Academic journal article Genetics

The Neutral Coalescent Process for Recent Gene Duplications and Copy-Number Variants

Academic journal article Genetics

The Neutral Coalescent Process for Recent Gene Duplications and Copy-Number Variants

Article excerpt


I describe a method for simulating samples from gene families of size two under a neutral coalescent process, for the case where the duplicate gene either has fixed recently in the population or is still segregating. When a duplicate locus has recently fixed by genetic drift, diversity in the new gene is expected to be reduced, and an excess of rare alleles is expected, relative to the predictions of the standard coalescent model. The expected patterns of polymorphism in segregating duplicates ("copy-number variants") depend both on the frequency of the duplicate in the sample and on the rate of crossing over between the two loci. When the crossover rate between the ancestral gene and the copy-number variant is low, the expected pattern of variability in the ancestral gene will be similar to the predictions of models of either balancing or positive selection, if the frequency of the duplicate in the sample is intermediate or high, respectively. Simulations are used to investigate the effect of crossing over between loci, and gene conversion between the duplicate loci, on levels of variability and the site-frequency spectrum.

(ProQuest: ... denotes formulae omitted.)

DUPLICATED genes are a ubiquitous feature of eukaryotic genomes. Comparative genome sequencing has revealed that distantly related organisms, such as flies, worms, yeast, and humans, have roughly similar gene numbers, but that the sizes of individual gene families vary across organisms (RUBIN et al. 2000). This genome-scale observation implies that genes are gained and lost over time during the course of evolution. In the last decade, considerable attention has been placed on using comparative genomic and functional data to elucidate the evolutionary forces shaping gene families (e.g., LYNCH and CONERY 2000; KONDRASHOV et al. 2002; GU et al. 2002a,b; THORNTON and LONG 2002; GU et al. 2003; GAO and INNAN 2004).

In parallel with the analysis of genomewide data, the systematic identification of recent duplication events in Drosophila species has identified several cases of lineage-specific genes, in an effort to understand the importance of natural selection in the early stages of the evolution of "new" genes (e.g., LONG and LANGLEY 1993; WANG et al. 2000, 2002, 2004; BELTRAN et al. 2002; BELTRAN and LONG 2003; JONES et al. 2005; LOPPIN et al. 2005; ARGUELLO et al. 2006; LEVINE et al. 2006; FAN and LONG 2007). Examples of recent gene duplications have also been described in humans, mice, and plant species (reviewed in LONG et al. 2003). In general, these studies consist of three parts: first, the identification of the recent duplicate; second, an investigation of patterns of polymorphism and/or divergence; and third, some assay of function, often at the level of gene expression, is performed to show that the new gene is functional.

The examples cited above all describe new genes that are fixed in population samples (the recent duplicate is found in all individuals sampled). There is currently much interest in identifying polymorphic duplications (so-called "copy-number variants," or CNV), particularly in the human genome (BAILEY et al. 2002, 2004; CHEUNG et al. 2003; IAFRATE et al. 2004; LI et al. 2004; SEBAT et al. 2004; SHARP et al. 2005, 2006; CONRAD et al. 2006; LOCKE et al. 2006; PERRY et al. 2006; REDON et al. 2006; GRAUBERT et al. 2007), as it is believed that CNVs may be a significant contributor to the genetic basis of disease. While CNVs have been implicated in several diseases (SHARP et al. 2006; SEBAT et al. 2007; reviewed in KONDRASHOV and KONDRASHOV 2006), they are also of significant evolutionary interest, as they will likely provide valuable insight into the earliest stages of the evolution of new genes.

Little is currently available in terms of a framework for analyzing polymorphism data from recent duplicates and CNVs. With regard to the analysis of singlenucleotide polymorphism data, the coalescent process (HUDSON 1983; TAJIMA 1983) has been well studied for single-copy genes. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.