Academic journal article Genetics

A Bayesian Framework for Generalized Linear Mixed Modeling Identifies New Candidate Loci for Late-Onset Alzheimer's Disease

Academic journal article Genetics

A Bayesian Framework for Generalized Linear Mixed Modeling Identifies New Candidate Loci for Late-Onset Alzheimer's Disease

Article excerpt

(ProQuest: ... denotes formulae omitted.)

LINKING genomic variants to traits is central to discovering the mechanisms of genetic diseases. To date, the National Human Genome Research Institute (NHGRI) has curated >1750 publications of genome-wide association studies (GWAS) that considered at least 100,000 single nucleotide polymorphisms (SNP) (Manolio 2010; Welter et al. 2014). The adoption of high-throughput sequencing technology has facilitated the rapid identification of potentially causal variants. The 1000 Genomes Project has characterized ~88 million variants by whole-genome sequencing (WGS) of 2504 individuals from 26 populations (Auton et al. 2015). Such sequencing approaches to genomic association will soon enable discovery at a base-pair resolution. Meanwhile, statistical methods for GWAS have evolved from odds ratio tests, to generalized linear regression models (LMs), to more sophisticated multivariate linear mixed models (LMMs). LMM approaches have the capacity to correct population structures and sample relatedness (Henderson 1953), thereby minimizing false positives due to allelic cosegregation. Consequently, the number of LMM-compatible computational tools for genetic studies is rapidly increasing, e.g., ASReml, TASSEL, EMMA, QTLRel, FaST-LMM, DOQTL, GEMMA, and GMMAT (Gilmour et al. 1995; Kang et al 2008; Zhang et al 2010; Cheng et al 2011; Lippert et al 2011; Gatti et al 2014; Zhou and Stephens 2014; Chen et al 2016).

While LMMs are efficient in correcting sample relatedness, response variables are restricted as numerical. Meanwhile, phenotypic traits in GWAS are often categorical, such as binary variables in case-control studies or multi-level ordered categorical variables which correspond to disease stages. To model discrete response variables in the context of mixed models for population relatedness correction, generalized LMMs (GLMMs) are required. Chen et al. (2016) published a method that handles a binary response variable in the context of a mixed model. However, multiple-level categorical variables are not supported. Current approaches commonly transform categorical variables into continuous variables to fit LMMs, following the assumption that the trait has constant residual variance. However, the constant residual variance assumption is often violated by a categorical trait, which can bias effect estimates.

The proliferation of multiple GWAS for a single disease has also generated a need for methods to systematically combine results from multiple studies. Such efforts, often pursued as meta-analyses, can dramatically boost statistical power through an increase in sample size (Kavvoura and Ioannidis 2008). However, association strengths of a given variant or a genetic locus typically fluctuate across studies, which may be due to different population compositions, environmental exposures, clinical reporting standards, and experimental platforms. As a result, it is often difficult or impossible to merge raw data from different studies into a single association model. Furthermore, a more general integration of prior information is often desirable, such as coexpression or other correlations between genes. Integration approaches with more flexibility are needed to address these issues.

To address these challenges, we created the Bayes-GLMM method that exploits the flexibility of a Bayesian modeling framework and the computing efficiency of the recently developed statistical programming language Stan (http:// mc-stan.org; Carpenter et al. 2017). As a Bayesian strategy, model parameters are assumed to be stochastic rather than fixed as in the case of frequentist approaches (Gelman et al. 2013). The stochastic nature of Bayesian modeling provides a coherent solution to combine published results of a related GWAS by configuring the prior distributions of the statistics of interest and computing posterior probabilities given new data (Verzilli et al. 2008; Newcombe et al. 2009; Stephens and Balding 2009). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.