Academic journal article Genetics

Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics

Academic journal article Genetics

Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics

Article excerpt

(ProQuest: ... denotes formulae omitted.)

UNTIL recently, there have been .2000 genome-wide association studies (GWAS) published with different traits or disease status (Hindorffet al. 2014). Most of them reported only regions of association, represented by SNPs with the lowest P-values in each region. Only a few provide further information of likely underlying causal variants. A noted exception is refinement based on Bayesian methods (Maller et al. 2012). Fine mapping the causal variants from the verified association regions is an important step toward understanding the complex biological mechanisms linking the genetic code to various traits or phenotypes.

Fine-mapping methods can be roughly divided into two groups. The first group was developed before the availability of high-density genotype data. These fine-mapping methods assume the causal variants are not genotyped in the data and aim to identify a region as close as possible to the causal variants (Morris et al. 2002; Durrant et al. 2004; Liang and Chiu 2005; Zollner and Pritchard 2005; Minichiello and Durbin 2006; Waldron et al. 2006). Because the causal variants are not observed in the data, these methods usually rely on various strong assumptions to model the relationship of the causal and the observed variants. Examples include models based on the coalescent theory (Morris et al. 2002; Zollner and Pritchard 2005; Minichiello and Durbin 2006) or statistical assumptions about the patterns of linkage disequilibrium (LD) (Liang and Chiu 2005). There are at least two limitations of these methods. First, the result is usually a region with a confidence value rather than candidate causal variants. Second, the result may be unreliable if the model assumptions are too strict and deviate far away from the real data, or the inferred region may be too wide to be useful if the model assumptions are too general.

The second group of fine-mapping methods assumes that the causal variants are among those measured. As the sequencing technology advances and with the availability of the HapMap Project (Altshuler et al. 2010) and the 1000 Genomes Project (Abecasis et al. 2012), it is feasible to obtain the sequence data of the association regions or impute almost all common variants with high quality. Now it is plausible to assume the causal variants exist in the data, either measured or imputed. How to best prioritize the candidate causal SNPs for follow-up functional studies becomes the aim of fine mapping (Faye et al. 2013). One simple way to prioritize variants is based on P-values. However, there are at least two limitations of this method. First, P-values do not give a comparable measure of the likelihood that a variant is causal across loci or across studies (Stephens and Balding 2009). Second, a noncausal variant could have the lowest P-value due to LD with a causal SNP and statistical fluctuation. This may also happen when a noncausal variant is in LD with multiple causal SNPs.

There have been several methods developed to address the above problems. For example, in Maller et al. (2012), a Bayesian method was developed to refine the association signal for 14 loci. This method circumvents the first limitation of using P-values by using the posterior inclusion probability (PIP). However, it assumes only a single causal variant for each locus. Recently, two fine-mapping methods, CAVIAR (Hormozdiari et al. 2014) and PAINTOR (Kichaev et al. 2014), were proposed, which liftthe restriction of a single causal variant in a locus and show much better performance than other finemapping methods. Another advantage is that only the marginal test statistics and the correlation coefficients among SNPs are required, instead of the original genotype data, which makes it easier to share data among different groups. When only marginal test statistics are available, which is not uncommon, the correlation among SNPs in a study can be approximately computed from an appropriate reference population panel, e. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.