Academic journal article Genetics

The Effects of Demography and Long-Term Selection on the Accuracy of Genomic Prediction with Sequence Data

Academic journal article Genetics

The Effects of Demography and Long-Term Selection on the Accuracy of Genomic Prediction with Sequence Data

Article excerpt

(ProQuest: ... denotes formulae omitted.)

METHODOLOGY has been developed to predict genetic value for polygenic traits in livestock and crops by exploiting high-density genome-wide SNP genotypes that are fitted simultaneously in an analytical model (Meuwissen et al. 2001). The same methodology can be applied in hu- man genetics, for example, to predict complex disease risk (reviewed in De los Campos et al. 2010). In livestock and plants this analytical approach is often referred to as "geno- mic selection" because the genomic predictions are used for selection decisions. First, a large "reference population" with genotypes and phenotypes is required to jointly estimate genome-wide SNP effects. Then the accuracy of prediction using the estimated SNP effects is reevaluated in an inde- pendent "validation population," before the genomic predic- tion equation is routinely applied on individuals with genotypes but no phenotypes.

Genomic prediction (GP) methods generally use dense genome-wide SNP genotypes and therefore rely on exploit- ing linkage disequilibrium (LD) between these SNPs and unknown causative mutations or quantitative trait loci (QTL). The lower the LD is between the SNP and causal mutations, the lower the accuracy will be of GP. As the number of generations separating the reference and valida- tion populations increases, the LD between SNPs and causative mutations is further eroded by recombination and therefore accuracy of GP will fall. The impact of recombination could be eliminated if the prediction was based on the causal mutations themselves. This would be possible if we had access to whole-genome sequence and this is increasingly likely as the cost of sequencing falls. Furthermore a number of species-specific databanks of whole-genome sequence are being generated and, using these as reference genomes, it is possible to impute full sequence for many thousands of individuals that have been genotyped with high-density SNP chips.

Several simulation studies have shown that there would be a significant advantage for genomic prediction using sequence compared to the equivalent of 30,000-60,000 genome-wide SNPs in an ~30-M genome (Meuwissen and Goddard 2010; Clark et al. 2011; Druet et al. 2014). How- ever, these studies did not compare use of sequence data with the higher-density commercial SNP arrays that are now commonly used for a number of species (for example, $600,000 SNPs in humans and cattle).

An additional argument for using sequence for GP is that it should be particularly advantageous when QTL have been under long-term negative selection (such as disease or fertility traits) : causal variants are then more likely to be rare and therefore in low LD with SNPs on commercial chips that typically have minor allele frequency (MAF) . 0.1. A study by Druet et al. (2014) indirectly investigated this po- tential advantage of sequence by simulating genotype data in which QTL were represented only as rare variants. Given this approach, these authors conclude that sequence data could significantly improve the accuracy of GP compared to the equivalent of 50,000 SNPs genome-wide. However, simulation studies in which only rare variants are chosen to act as surrogate QTL may not provide an adequate model of loci under long-term negative selection. For example, changes in ancestral demography such as a recent bottle- neck in effective population size (JVe) also exert a strong influence on the distribution of allele frequencies (e.g., Marth et al. 2004) and even mutations with a deleterious effect on fitness may drift to higher frequencies than would be expected in a population with no recent bottleneck. Also, patterns of LD surrounding loci that are under long-term negative selection may be quite different from those surround- ing neutral loci due to "background selection" (Charlesworth et al. 1993).

In this study we investigate the potential advantages of sequence data for genomic prediction and demonstrate that this will jointly depend on the ancestral demography of a population, the presence or absence of long-term negative selection acting on QTL, and the method of analysis. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.