Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data

By Fragoso, Christopher A.; Heffelfinger, Christopher et al. | Genetics, February 2016 | Go to article overview

Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data


Fragoso, Christopher A., Heffelfinger, Christopher, Zhao, Hongyu, Dellaporta, Stephen L., Genetics


(ProQuest: ... denotes formulae omitted.)

THE imputation of missing genotype data has been a key research topic in statistical genetics since well before the advent of next-generation sequencing (NGS) technologies. The goal of many of these algorithms was to reconstruct haplotypes from Sanger or microarray-based genotyping, usually on human populations. Strategies employing the expectation-maximization algorithm (Hawley and Kidd 1995; Long et al. 1995; Qin et al. 2002; Scheet and Stephens 2006), Bayesian inference (Niu et al. 2002; Stephens and Donnelly 2003), or Markovian methodology (Stephens et al. 2001; Broman et al. 2003; Broman and Sen 2009), local ancestry and gametic phase, could be used to resolve missing markers within a population (Browning and Browning 2011). In these cases, missing genotypes were assigned based on the most likely proximal haplotypes. These computational methods greatly increased the informative content of genotyping information, especially for population studies (Spencer et al. 2009; Cleveland et al. 2011). While these programs were powerful and accurate, they also could be computationally expensive. Further, they assumed that available genotypes were largely correct, which could cause issues with sequencing data sets.

Thedevelopmentof programsthat focusedprimarilyonthe imputation of missing data and haplotype phasing was likely motivated by several factors. Genome-wide association studies could be enhanced by the inference of additional markers using large multipopulation datasetssuch as the International HapMap Project (International HapMap Consortium et al. 2010). The emergence of the meta-analysis led to a need for algorithms that could merge disparate data sets (Browning and Browning 2007; Howie et al. 2009; Li et al. 2010; Liu et al. 2013; Fuchsberger et al. 2015). These algorithms often employed large haplotype reference panels to improve imputation (Marchini et al. 2007; Browning and Browning 2009; Howie et al. 2009). In biallelic recombinant plant populations, a parental reference panel is sufficient to explain the genetic structure of the offspring (Yu et al. 2008), but reference panels are often not available.

Genome resequencing has become a critical tool for characterizing genetic diversity in plant populations. Unlike genotyping and PCR-based assays, sequencing can characterize large numbers of useful markers without a priori knowledge of a given population's genetic diversity. However, when applying genome resequencing to the study of large populations, both time and cost must be considered. Sequencing methods employing multiplexing, the simultaneous sequencing of multiple samples in a single pool, have been developed to enhance efficiency and reduce sample costs. These methods include multiplexed whole-genome sequencing (WGS), whole-exome sequencing (WES), restriction-site-associated DNA markers (RAD), and genotype by sequencing (GBS) (Miller et al. 2007; Broman and Sen 2009; Wu et al. 2010; Bamshad et al. 2011; Elshire et al. 2011; Li et al. 2011; Nielsen et al. 2011; 1000 Genomes Project Consortium et al. 2012; Heffelfinger et al. 2014). WES, RAD, and GBS, collectively called reduced-representation sequencing (RRS) methods, interrogate a small but consistent portion of a genome. The tradeoff occurs when large numbers of samples are pooled and sequenced together: individual-per-sample and per-site coverage can be highly variable.

Any low-coverage sequencingmethodwillresult inmissing and erroneous genotypes. Missing data occur when sequencing coverage is insufficient to interrogate every available site and allele in each sample. Although a RRS experiment is restricted by design to a subset of the total number of alleles, it is highly unlikely that the entire set of available sites and alleles will be recovered in each sample. The proportion of unrecovered alleles increases with marker density and the level of multiplexing. Missing data manifest in two forms. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA 8, MLA 7, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Note: primary sources have slightly different requirements for citation. Please see these guidelines for more information.

Cited article

Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen
Items saved from this article
  • Highlights & Notes
  • Citations
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA 8, MLA 7, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Search by... Author
    Show... All Results Primary Sources Peer-reviewed

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.