Optimal Design of Genetic Studies of Gene Expression with Two-Color Microarrays in Outbred Crosses

Article excerpt

ABSTRACT

Combining global gene-expression profiling and genetic analysis of natural allelic variation (genetical genomics) has great potential in dissecting the genetic pathways underlying complex phenotypes. Efficient use of microarrays is paramount in experimental design as the cost of conducting this type of study is high. For those organisms where recombinant inbred lines are available for mapping, the "distant pair design" maximizes the number of informative contrasts over all marker loci. Here, we describe an extension of this design, named the "optimal pair design," for use with F^sub 2^ crosses between outbred lines. The performance of this design is investigated by simulation and compared to several other two-color microarray designs. We show that, for a given number of microarrays, the optimal pair design outperforms all other designs considered for detection of expression quantitative trait loci (eQTL) with additive effects by linkage analysis. We also discuss the suitability of this design for outbred crosses in organisms with large genomes and for detection of dominance.

GENETIC analysis of variation in gene expression, also known as genetical genomics ( JansenandNap 2001), has great potential for dissecting the mechanisms underlying complex phenotypes (Mehrabian et al. 2005; Schadt et al. 2005). Although variation in transcript abundance isoftenin response toexternalenvironmental factors, part of the between-individual variation in expression of a substantial number of genes can be explained byDNA polymorphisms ( Jin et al. 2001).To date, the vastmajority of published studies in this research area have been conducted in model organisms such as yeast (Brem et al. 2002), flowering plant (Keurentjes et al. 2007), nematode worm (Li et al. 2006), mouse (Schadt et al. 2003; Bystrykh et al. 2005), and rat (Hubner et al. 2005). There are also a number of studies that focused on human populations (Monks et al. 2004;Morley et al. 2004; Stranger et al. 2005). Efforts in mapping expression quantitative trait loci (eQTL) have provided strong evidence for candidate gene selection in studies of complexphenotypesuchashypertension( Hubneret al.2005) and childhood asthma (Dixon et al. 2007).

Like in any QTL study, appropriate sample size is essential for adequate power in eQTL detection. Although many of the published studies have provided very interesting insights into the properties of genetic loci that regulate gene-expression phenotypes, the small sample sizes of the early studies meant they have limited power to detect eQTL of small to moderate effects (De Koning and Haley 2005). In many cases, there is no shortage of animals or cell lines for a genetical genomics approach as the genetic materials have already been collected for concurrent large-scale studies. Therefore, themajor factor that restricts sample sizes tends to be the high cost of the associated technologies, particularly the cost ofmicroarrays. To address this issue, significant improvement in the usage of microarray resources for genetical genomics has been proposed in a number of articles. Jin et al. (2004) presented an algorithm for "selective phenotyping" inwhich a subsample was chosen from the entire sample set for maximum genotypic dissimilarity as a way to reduce the amount of phenotyping without sacrificing sensitivity in QTL detection. In a different article, Piepho (2005) discussed the optimal allocation of samples to cDNA microarrays for detecting heterosis. Bueno Filho et al. (2006) covered a range of optimal microarray designs, from studying the genotypic effect of a single locus to models that include both fixed treatment and random polygenic effects. Rosa et al. (2006) provided a comprehensive review on microarray design for eQTL mapping. Fu and Jansen (2006) proposed a more general approach called the distant pair design, which combines optimal allocation by hybridizing most dissimilar samples and selective genotyping when the population resource is large. …