Academic journal article Genetics

Modeling Human Population Separation History Using Physically Phased Genomes

Academic journal article Genetics

Modeling Human Population Separation History Using Physically Phased Genomes

Article excerpt

(ProQuest: ... denotes formulae omitted.)

HAPLOTYPES contain rich information about population history and are shaped by population size, natural selection, and recombination (Veeramah and Hammer 2014; Schraiber and Akey 2015). Due to historic recombination events there are 100s of 1000s of pairs of loci along a chromosome that have distinct histories. Recent methodological advances permit the estimation of a detailed population demographic history from a single or several whole-genome sequences based on the distribution of coalescent times across the genome. For example, Li and Durbin (2011) developed the pairwise sequentially Markovian coalescent (PSMC) model to reconstruct the distribution of the time since the most recent common ancestor (TMRCA) between the two alleles of an individual, and infer population size changes over time. Typically, these TMRCA values are calculated using the two haploid genomes that compose the diploid genome of a single sample (Li and Durbin 2011). When PSMC is applied to two haplotypes obtained from different populations, the inferred TMRCA distribution is informative about the timing of population splits, since the time after which nearly no coalescence events occur is a good estimate for the population split time. One key question regarding human population history is the timing of population splits and the dynamics of separation between Africans and non-Africans, which has a great influence on modern genetic diversity. Li and Durbin (2011) paired X chromosomes from African and non-African males and suggested that the two groups remained as one population until 60-80 KYA with substantial genetic exchange up until 20-40 KYA [assuming a mutation rate of 2.5 X 10~8 bp per generation and 25 years as generation time, estimates which approximately double when assuming a mutation rate of 1.25 X 1028 bp per generation and 30 years as generation time (Schiffels and Durbin 2014)]. Subsequently, PSMC applied to pseudodiploid sequences was used to date the divergence time between nonhuman primate subspecies (Prado-Martinez et al. 2013). However, PSMC curves themselves provide only a qualitative measure of population separation and estimating split times is complicated by the presence of migration (Pritchard 2011).

The multiple sequentially Markovian coalescent (MSMC) model (Schiffels and Durbin 2014) extends PSMC to multiple individuals, focusing on the first coalescence event for any pair of haplotypes. With multiple haplotypes from different populations, MSMC calculates the ratio between cross-population and within-population coalescence rates, termed the "relative cross-coalescence rate," a value reflecting population separation history. Schiffels and Durbin (2014) applied MSMC on statistically phased genomes (two or four haplotypes per population) and suggested that African and non-African populations exhibited a slow, gradual separation beginning earlier than 200 KYA and lasting until ~40 KYA, while the median point of such divergence was ~60-80 KYA. The midpoint of the relative cross-coalescence decay curve has been used as an estimate of population separation time (Pagani et al. 2015; Schiffels and Durbin 2014). Although useful, this approach does not generate parametric estimates for population history under standard models. As none of these methods to infer population separation history were applied on physically phased genomes, it is unclear how phasing errors and missing data affect this type of analysis.

In this article, we construct physically phased genomes of five individuals from diverse African populations [including Yoruba (YRI), Esan (ESN), Gambia (GWD), Maasai (MKK), and Mende (MSL)]. We reanalyzed fosmid sequencing data for individuals from the Gujarati (GIH), San, and Mbuti populations, assess the ability to correctly assemble SNP haplotypes using fosmid pool sequencing, and compare the resulting data with statistically phased haplotypes. We have previously compared several reconstructed haplotypes from a subset of these samples with those released by phase three of the 1000 Genomes Project (1000 Genomes Project Consortium 2015). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.