Academic journal article Genetics

Inference and Analysis of Population Structure Using Genetic Data and Network Theory

Academic journal article Genetics

Inference and Analysis of Population Structure Using Genetic Data and Network Theory

Article excerpt

(ProQuest: ... denotes formulae omitted.)

INFERENCE of population structure from genetic data is often used to understand underlying evolutionary and demographic processes experienced by populations, and is an important aspect of many genetic studies. Such inference is mainly done by clustering individuals into groups, often referred to as demes or subpopulations. Evaluation of population structure and gene flow levels between subpopulations allows inference about migration patterns and their genetic consequences (Templeton 2006; Allendorf et al. 2012). As sequencing of larger portions of the genome is becoming more readily available, there is an increasing need for a variety of computationally efficient statistically testable methods for such inference.

Analysis of population structure can be done at the subpopulation-population level by assuming putative subpopulations and studying how these relate genetically [e.g., F-statistics, analysis of molecular variance (AMOVA) (Excoffier et al. 1992), phylogenetic methods (Cavalli-Sforza and Edwards 1967; Saitou and Nei 1987; Pickrell and Pritchard 2012)], or at the individual subpopulation level by attempting to cluster individuals to subpopulations. The methods for clustering individuals based on genetic data can be further divided into two categories: model-based approaches and distance-based approaches (Pritchard et al. 2000; Alexander et al. 2009; Wollstein and Lao 2015). Model-based approaches evaluate the likelihood of the observed data, assuming that they are randomly drawn from apredefined model of the population, e.g., that there are K subpopulations and that these subpopulations are at Hardy-Weinberg equilibrium (HWE). Distance-based approaches aim at identification of clusters by analysis of matrices describing genetic distances or genetic similarities between individuals or populations, e.g., by visualization using multidimensional scaling (MDS) methods such as principle components analysis (PCA). Distance-based methods are usually model-free and do not require prior assumptions, as with the model-based methods. Over the last decade or so, model-based methods have been more dominant as procedures for inference about population structure, mostly with implementation of Bayesian clustering and maximumlikelihood techniques in programs such as STRUCTURE, ADMIXTURE (Alexander et al. 2009), and BAPS (Corander et al. 2003). It has been pointed out that distance-based methods have several disadvantages (Pritchard et al. 2000): they are not rigorous enough and rely on graphical visualization, they depend on the distance measure used, it is difficult to assess the significance of the resulting clustering, and it is difficult to incorporate additional information such as geographic location of the samples. Jombart et al. (2008), and Yang et al. (2012) address this last concern. Given these disadvantages, it would seem that distance-based measures are less suitable for statistical inference about population structure. However, model-based approaches suffer from the need to restrict interpretation of the results by heavily relying on the prior assumptions of the model, e.g., that the populations meet certain equilibria conditions such as migration-drift or HWE (Pritchard et al. 2000).

There has recently been a flourish of network theory applications to genetic questions in genomics (Forst 2002), landscape genetics (Garroway et al. 2008), and population structure at the subpopulation-population level (Dyer and Nason 2004; Rozenfeld et al. 2008; Ball et al. 2010; Munwes et al. 2010). Recently, a network-based visualization tool, NETVIEW (Neuditschko et al. 2012), of fine-scale genetic population structure, using a superparamagnetic clustering algorithm (Blatt et al. 1996), has been proposed and applied successfully to analysis of livestock breeds (Burren et al. 2014; Neuditschko et al. 2014), and other network clustering approaches have also been implemented on genetic data (Cohen et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.