Academic journal article Genetics

Exploring Population Genetic Models with Recombination Using Efficient Forward-Time Simulations

Academic journal article Genetics

Exploring Population Genetic Models with Recombination Using Efficient Forward-Time Simulations

Article excerpt

ABSTRACT

We present an exact forward-in-time algorithm that can efficiently simulate the evolution of a finite population under the Wright-Fisher model. We used simulations based on this algorithm to verify the accuracy of the ancestral recombination graph approximation by comparing it to the exact Wright-Fisher scenario. We find that the recombination graph is generally a very good approximation for models with complete outcrossing, whereas, for models with self-fertilization, the approximation becomes slightly inexact for some combinations of selfing and recombination parameters.

COALESCENT theory provides a continuous-time approximation for the history of small samples in large populations and coalescent simulation is a widely used tool in population genetics. Under this framework, the genealogy of a sample of DNA sequences is modeled backward in time and neutral mutations are superposed on this genealogy to generate sequence polymorphism data (Kingman 1982; Hudson 1983; Rosenberg and Nordborg 2002). Forward simulations, in contrast, model the evolution of all the sequences in a population exactly, forward in time and generation by generation. Because coalescent simulations consider only those chromosomes that carry material ancestral to the sample, and, by making a continuous-time approximation skip uninteresting generations whose events do not affect the sample, they are computationally much more efficient than forward simulation programs. However, despite their inefficiency, forward simulations are necessary if we wish to simulate data sets under complex and realistic biological scenarios (e.g., natural selection at multiple linked loci) that are difficult to model accurately using the coalescent. Given the dramatic growth in the power of computing, forward-time simulations are currently feasible for large genomic regions (e.g., megabase scale) and many simulation packages have been developed recently (e.g., Balloux 2001; Hey 2004; Hoggart et al. 2005; Peng and Kimmel 2005; Dudek et al. 2006; Guillaume and Rougemont 2006; Sanford et al. 2007) and have also found important applications (e.g., BallouxandGoudet 2002; Pineda-Krchand Redfield 2005;PengandKimmel 2007). Here,we present anexact forward-in-time algorithm that can efficiently simulate the evolution of a finite population undergoing mutations, recombination, and natural selection at multiple linked loci. Incontrast toexistingforward-timesimulators that consider the population genealogy generation by generation, our forward algorithm uses the genealogical information for multiple generations at a time, and on thebasis ofthis information, simulatesonly thosechromosomes in the next generation that can potentially contribute to the future population. We show that such a forward-backward scheme combined with other optimizations can lead to substantial improvements in run-time efficiency. We use our simulation program to evaluate coalescent models with recombination by comparing them to the exact Wright-Fisher model.

SIMULATION ALGORITHM

Our algorithm is implemented in the C++ programming language and we simulatedata setsunder the Wright- Fisher model assumptions. Individuals in a population are assumed to be diploid, the population size is assumed constant (this assumption can readily be relaxed), and generations are always non overlapping. Chromosomes within the population are represented by sorted arrays of integers that correspond to the locations of their mutations in base pairs. In this representation, a location is considered polymorphic if it occurs in some but not all of the chromosomes. Over time, the chromosome arrays undergo changes due to recombination (i.e., are partially replaced by parts of other arrays) and mutation (i.e., new integer locations get inserted). They also increase or decrease in the number of copies due to genetic drift. At any given time, we keep track of chromosomes belonging only to the current and previous generations and keep reusing these arrays. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.