Academic journal article Genetics

Increased Proportion of Variance Explained and Prediction Accuracy of Survival of Breast Cancer Patients with Use of Whole-Genome Multiomic Profiles

Academic journal article Genetics

Increased Proportion of Variance Explained and Prediction Accuracy of Survival of Breast Cancer Patients with Use of Whole-Genome Multiomic Profiles

Article excerpt

(ProQuest: ... denotes formulae omitted.)

THE continued development of high-throughput genomic technologies has fundamentally changed the genetic analyses of complex traits and diseases. These technologies provide large volumes of data from multiple "omic" layers, including the genome (e.g., SNPs, copy-number variants, and mutations), the epigenome (e.g., methylation), the transcriptome (e.g., RNA-seq), the proteome, and so on. This information can be used to develop models for understanding and predicting disease risk and disease prognosis. Recently, several studies have uncovered unprecedented numbers of omic factors associated with disease risk and progression. For instance, in the last decade, genome-wide association studies (GWAS) have reported large numbers of SNPs (e.g., http://www.genome.gov/gwastudies/)and structural variants [e.g., copy-number variants (Beroukhim et al. 2010; Morrow 2010)] associated with disease risk. Likewise, several studies have reported methylation sites (Dedeurwaerder et al. 2011; Fackler et al. 2011; Fang et al. 2011) and genes with expression profiles associated with prognosis (Perou et al. 2000; Sørlie et al. 2001; Van'tVeer et al. 2002; Sotiriou and Pusztai 2009; Gyorffy et al. 2016). However, despite the tremendous progress achieved, use of this information in clinical practice remains limited in part because the proportion of variance in disease risk or prognosis explained by the individual factors identified still remains limited.

Data integration can be an avenue for improving our understanding and our ability to predict disease risk and prognosis. Integration can take place by combining information from multiple sites across the genome as well as by integrating inputs from different omics. In prediction of complex traits and disease risk, several studies (e.g., Purcell et al. 2009; de los Campos et al. 2010c; Yang et al. 2010; Makowsky et al. 2011; Vazquez et al. 2012) have demonstrated that the proportion of variance explained by use of whole-DNA profiles is considerably higher than that achieved by models that use a limited number of GWAS-significant variants. Likewise, several studies have demonstrated benefits of integrating data from multiple omics. For example, Chen et al. (2012) demonstrated how integrated omic profiles can provide insights into the development of type 2 diabetes. However, our ability to integrate whole-genome multilayer omic data into risk assessments still lags behind.

Wheeler et al. (2014) and Vazquez et al. (2014) proposed using what Wheeler called "Omic Kriging" for prediction of complex traits and disease risk using multiomic profiles. Kriging is a kernel-smoothing technique commonly used in spatial statistics (e.g., Cressie 2015). From a statistical perspective, kriging is the best linear unbiased predictor (BLUP) method commonly used in quantitative genetics (Henderson 1950; Robinson 1991) using pedigree (Henderson 1950, 1975) or DNA information (G-BLUP) (VanRaden 2008)]. OmicKriging is a multikernel method (de los Campos et al. 2010a, b) in which the resulting kernel is a weighted average of similarity matrices derived from different omics.

Although OmicKriging represents a promising method for integrating multiomic data, the method has potentially important limitations. First, the approach assumes that the architecture of effectsishomogeneous acrossomiclayers.This assumption may not hold if some omics have a sparse architecture of effect (i.e., a few factors have sizable effects, and the rest have no effect) and other omics have non-sparse-effects architecture (i.e., all inputs have small effects). Second, OmicKriging assumes implicitly that omics act in an additive manner (i.e., there are no interactions between omics). This may fail, for instance, if the effects of one layer (e.g.,SNP)aremodulated by a second layer (e.g.,methylation).

In this study, we describe a modeling framework that (1) allows integration of high-dimension inputs from multiple omic layers, (2) contemplates different effect architectures across layers, and (3) incorporates interactions between omics. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.