Academic journal article Genetics

Assessing the Relationship of Ancient and Modern Populations

Academic journal article Genetics

Assessing the Relationship of Ancient and Modern Populations

Article excerpt

(ProQuest: ... denotes formulae omitted.)

ANCIENT DNA (aDNA) is now ubiquitous in population genetics. Advances in DNA isolation (Dabney et al. 2013), library preparation (Meyer etai. 2012), bone sampling (Pinhasi et ai 2015), and sequence capture (Haak et al. 2015) make it possible to obtain genome-wide data from hundreds of samples (Allentoft et al. 2015; Haak et al. 2015; Mathieson et al. 2015; Fu et al. 2016). Analysis of these data can provide new insight into recent evolutionary processes, which leave faint signatures in modern genomes, including natural selection (Jewett et al 2016; Schraiber et al. 2016) and population replacement (Lazaridis et al. 2014; Sjödin et al. 2014).

One of the most powerful uses of aDNA is to assess the continuity of ancient and modern populations. In many cases, it is unclear whether populations that occupied an area in the past are the direct ancestors of the current inhabitants ofthat area. However, this can be next to impossible to assess using only modern genomes. Questions of population continuity and replacement have particular relevance for the spread of cultures and technology in humans (Lazaridis etal. 2016). For instance, recent work showed that modern South Americans are descended from people associated with the Clovis culture that inhabited North America over 10,000 years ago, further enhancing our understanding of the peopling of the Americas (Rasmussen etal. 2014).

Despite its utility in addressing difficult-to-answer questions in evolutionary biology, aDNA also has several limitations. Most strikingly, DNA decays rapidly following the death of an organism, resulting in highly fragmented, degraded starting material when sequencing (Sawyer et al. 2012). Thus, ancient data are frequently sequenced to low coverage, and has a significantly higher rate of misleadingly called nucleotides than modern samples. When working with diploid data, as in aDNA extracted from plants and animals, the low coverage prevents genotypes from being called with confidence.

Several strategies are commonly used to address the lowcoverage data. One of the most common approaches is to sample a random read from each covered site, and use that as a haploid genotype call (Skoglund etai. 2012; Allentoft etai. 2015; Haak et ai. 2015; Mathieson et ai. 2015; Fu et ai. 2016; Lazaridis et ai. 2016). Many common approaches to the analyses of aDNA, such as the usage of F-statistics (Green etai. 2010; Patterson etai. 2012), are designed with this kind of dataset in mind. F-statistics can be interpreted as linear combinations of simpler summary statistics, and can often be understood in terms of testing a tree-like structure relating populations. Nonetheless, despite the simplicity and appeal of this approach, it has several drawbacks. Primarily, it throws away reads from sites that are covered more than once, resulting in a potential loss of information from expensive, difficult-to-acquire data. Moreover, as shown by Peter (2016), F-statistics are fundamentally based on heterozygosity, which is determined by samples of size 2, and thus limited in power. Finally, these approaches are also strongly impacted by sequencing error, postmortem damage (PMD), and contamination.

On the other hand, several approaches exist to either work with genotype likelihoods or the raw read data. Genotype likelihoods are the probabilities of the read data at a site, given each of the three possible diploid genotypes at that site. They can be used in calculation of population genetic statistics, or likelihood functions, to average over uncertainty in the genotype (Korneliussen et ai. 2014). However, many such approaches assume that genotype likelihoods are fixed by the SNP calling algorithm [although they may be recalibrated to account for aDNA-specific errors, as in Jónsson et ai. (2013)]. However, with low coverage data, an increase in accuracy is expected if genotype likelihoods are coestimated with other parameters ofinterest, due to the covariation between processes that influence read quality and genetic diversity, such as contamination. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.