Academic journal article Genetics

Genomic Rearrangements in Arabidopsis Considered as Quantitative Traits

Academic journal article Genetics

Genomic Rearrangements in Arabidopsis Considered as Quantitative Traits

Article excerpt

(ProQuest: ... denotes formulae omitted.)

WHILE genome resequencing can readily determine variations such as Single Nucleotide Polymorphisms (SNPs) and small indels, it remains challenging to identify structural variants (SVs) and rearrangements, despite improvement in algorithms for calling SVs. The current gold standard for determining SVs between individuals is by de novo assembly (Simpson and Pop 2015). This requires high-coverage paired-end sequence over a range of insert sizes, together with long-range information such as from long-read technologies (Chaisson and Tesler 2012; Jain et al. 2015) for scaffolding. The high cost and low throughput of de novo assembly limit its use, and leaves open two important questions. First, whether an SV is identified in an individual frequently enough to contribute to phenotypic heritability in a population. Second, whether an SV represents a local rearrangement, such as a deletion, inversion or tandem copy-number variant (CNV), or is long-range, such as a transposition (Cao et al. 2011; Mills et al. 2011).

SVs are frequently revealed by the anomalous alignment of short-reads to the reference genome. Specific anomaly signatures characterize different types of SVs (Table 1). Thus, same-strand pairs indicate inversion, high read coverage duplications, abnormal insert sizes, and unpaired reads indels (Figure 1). These anomalies arise, often in combination, because the reads have been aligned to the wrong genome-the anomalies disappear if instead the reads are aligned to the true genome. This idea is used by algorithms such as GATK (McKenna et al. 2010) and Platypus (Rimmer et al. 2014) that identify small indels by local realignment, and in whole-genome reassembly by iterative realignment (Gan etal. 2011).

Many SV-calling algorithms utilize read-anomaly signatures to identify SVs segregating in individuals sequenced at high coverage (Chen et al. 2009; Manske and Kwiatkowski 2009; Ye et al. 2009; Simpson et al. 2010; Rausch et al. 2012; Sindi et al. 2012; Layer et al. 2014; Kronenberg et al. 2015). They focus on short-range SVs because of the difficulties in distinguishing long-range rearrangements from read mapping errors. They also work best when calling SVs in individuals sequenced at intermediate to high coverage; for example, LUMPY (Layer et al. 2014) and WHAM (Kronenberg et al 2015) are most sensitive at coverage >10X. In other applications, e.g., cancer resequencing, typical coverage is even higher, at 30X or above.

Further challenges arise when calling SVs in large samples of population sequence data, for the purpose of testing genetic association. Population sequencing provides an alternative to genotyping by SNP arrays, simultaneously providing both haplotype reference panels for imputation (Durbin et al. 2010), and cohorts for disease mapping (Cai et al. 2015; Nicod et al. 2016). As the sample size increases, the coverage of each individual may be reduced without affecting imputation accuracy (Davies et al. 2016). Although the information present in each sample is then sparse, and therefore it would be difficult to call SVs (and even SNPs) on an individual basis, by pooling information across samples it might be possible to determine common SVs analogously to the way SNPs are imputed.

In addition to simple indels, inversions, and transpositions, where a segment with well-defined breakpoints is affected, many SVs are composites of multiple events (Yalcin et al. 2011), often driven by transposons and other mobile elements. These complex SVs resist simple classification, and the precise sequence of mutations that occurred may be unrecoverable. While current algorithms for calling SVs in simulated high-coverage human data can identify simple SVs with sensitivities of ~90% depending on the type of SV (Kronenberg et al. 2015), they are less accurate when applied to real data, and their performance on complex SVs is unreported.

Despite this, there may still be strong evidence from readmapping anomalies that an SV of some sort segregates at a locus. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.