Academic journal article Genetics

Using Mendelian Inheritance to Improve High-Throughput SNP Discovery

Academic journal article Genetics

Using Mendelian Inheritance to Improve High-Throughput SNP Discovery

Article excerpt

ABSTRACT Restriction site-associated DNA sequencing or genotyping-by-sequencing (GBS) approaches allow for rapid and cost-effective discovery and genotyping of thousands of single-nucleotide polymorphisms (SNPs) in multiple individuals. However, rigorous quality control practices are needed to avoid high levels of error and bias with these reduced representation methods. We developed a formal statistical framework for filtering spurious loci, using Mendelian inheritance patterns in nuclear families, that accommodates variable-quality genotype calls and missing data-both rampant issues with GBS data-and for identifying sex-linked SNPs. Simulations predict excellent performance of both the Mendelian filter and the sex-linkage assignment under a variety of conditions. We further evaluate our method by applying it to real GBS data and validating a subset of high-quality SNPs. These results demonstrate that our metric of Mendelian inheritance is a powerful quality filter for GBS loci that is complementary to standard coverage and Hardy-Weinberg filters. The described method, implemented in the software MendelChecker, will improve quality control during SNP discovery in nonmodel as well as model organisms.

(ProQuest: ... denotes formulae omitted.)

THE advent of next-generation sequencing technologies has revolutionized biological research by allowing the pursuit of fundamental ecological and evolutionary genomics questions in nonmodel organisms (Hudson 2008). It is now feasible to discover genome-wide markers in any species, even if few or no prior genetic resources are available (Ellegren and Sheldon 2008). However, many modem studies now require high-quality genotypes for tens or hundreds of individuals. While recent technological advances have significantiy lowered the cost of DNA sequencing, it is still expensive to assay genetic variation in large numbers of individuals (Namm et al. 2013).

Several methods have been developed to reduce the cost of high-throughput genotyping by restricting the complexity of the genome. These methods selectively sequence regions of the genome near restriction sites, allowing simultaneous discovery and genotyping of thousands of single-nucleotide polymorphisms (SNPs) distributed across the genome. Several variations exist, but these methods are generally known as re- striction site-associated DNA sequencing (RAD-seq) or genotyp- ing by sequencing (GBS) (reviewed in Davey et aL 2011). GBS methods have been used successfully in a variety of applica- tions, including phylogenetics (Rubin et aL 2012), population genomics (White et al. 2013), genome-wide association studies (Parchman et aL 2012), spéciation genomics (Taylor et al. 2014), and genetic mapping (Andolfatto et al. 2011).

A central challenge in analyzing GBS data is the high variation in coverage across individuals and across loci, creating uncertainty in SNP calls and genotype assignments (Davey et al. 2011). In addition to the polymerase chain reaction (PGR) and sequencing error associated with next-generation sequencing platforms, this cost-effective method of high- throughput genotyping comes with its own set of caveats: restriction fragment length bias and PCR GC content bias contribute to high variation in read depth among loci, and restriction-site polymorphism can skew allelic representation and therefore estimates of population genetic parameters (Arnold et al. 2013; Davey et al. 2013; Gautier et al. 2013). In the absence of a reference genome, spurious SNP calls may also result from collapsed paralogs or repeats during de novo assembly of reads into putative unique loci. Most GBS studies have used a set of heuristic criteria to filter out spurious sites, including read depth, proportion of missing data, and ob- served heterozygosity (Davey et al. 2011). While these simple filters are expected to discard most problematic loci during variant discovery, and applications such as trait mapping and phylogenetic inference may be robust to spurious calls at some loci, the use of GBS in population genomics studies may re- quire careful consideration (Rubin et al 2012; Arnold et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.