Academic journal article Genetics

Accurate Discovery of Expression Quantitative Trait Loci under Confounding from Spurious and Genuine Regulatory Hotspots

Academic journal article Genetics

Accurate Discovery of Expression Quantitative Trait Loci under Confounding from Spurious and Genuine Regulatory Hotspots

Article excerpt

ABSTRACT

In genomewide mapping of expression quantitative trait loci (eQTL), it is widely believed that thousands of genes are trans-regulated by a small number of genomic regions called "regulatory hotspots," resulting in "trans-regulatory bands" in an eQTL map. As several recent studies have demonstrated, technical confounding factors such as batch effects can complicate eQTL analysis by causing many spurious associations including spurious regulatory hotspots. Yet little is understood about how these technical confounding factors affect eQTL analyses and how to correct for these factors. Our analysis of data sets with biological replicates suggests that it is this intersample correlation structure inherent in expression data that leads to spurious associations between genetic loci and a large number of transcripts inducing spurious regulatory hotspots. We propose a statistical method that corrects for the spurious associations caused by complex intersample correlation of expression measurements in eQTL mapping. Applying our intersample correlation emended (ICE) eQTL mapping method to mouse, yeast, and human identifies many more cis associations while eliminating most of the spurious trans associations. The concordances of cis and trans associations have consistently increased between different replicates, tissues, and populations, demonstrating the higher accuracy of our method to identify real genetic effects.

GENOMEWIDE analysis of gene expression data in segregating populations has been widely conducted to understand the genetic basis of regulation in many organisms including yeast (Brem and Kruglyak 2005), Arabidopsis (Keurentjes et al. 2007), mouse (Bystrykh et al. 2005; Chesler et al. 2005), and human (Cheung et al. 2005; Stranger et al. 2007). To understand the complex regulatory network, numerous statistical analysis methods have been proposed, including clustering of coregulated genes (Yvert et al. 2003), multipoint linkage analysis (Brem et al. 2005; Storey et al. 2005), prediction of regulatory modules (Ghanzalpour et al. 2006; Lee et al. 2006), and pathway enrichment analysis (Subramanian et al. 2005; Ye and Eskin 2007).

Among these "genetical genomics" approaches, the most widely used statistical analysis is expression quantitative trait loci (eQTL) mapping between genetic variation and gene expression levels (Brem et al. 2002). The goalof these studies is toidentify associationsbetween an individual genetic variation and the differential expression of a gene that might help explain the transcriptional regulation of the gene. Many recent studies have identified a large number of cis associations between eQTLand the expression of genes in close proximity. They have also identified many more trans associations between eQTL and the expression of genes in other regions of the genome (Yvert et al. 2003; Chesler et al. 2005; Hubner et al. 2005).An interesting observation consistent across multiple data sets is that hundreds or even thousands of genes are trans-regulated by a small number of genomic regions called "regulatory hotspots" (Chesler et al. 2005;Keurentjes et al. 2007) and these associations appear as "trans-regulatory bands" in eQTL plots regardless of the normalization method used (Bystrykh et al. 2005; Chesler et al. 2005, 2006;Hubner et al. 2005; Peirce et al. 2006; Williams et al. 2006).

Recent genetical genomics studies of yeast have provided much evidence supporting the existence of global regulators that induce trans-regulatory bands (Foss et al. 2007; Perlstein et al. 2007). Formammalian expression data sets, although large numbers of regulatory hotspots have consistently been observed, the locations of these regulatory hotspots are inconsistent between different data sets (Chesler et al. 2005; Hubner et al. 2005; Peirce et al. 2006). Simulation studies suggest that spurious regulatory hotspots may be frequently observed in outbred populations (Perez-Enciso 2004; de Koning and Haley 2005; Wang et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.