Academic journal article Genetics

A Powerful Variant-Set Association Test Based on Chi-Square Distribution

Academic journal article Genetics

A Powerful Variant-Set Association Test Based on Chi-Square Distribution

Article excerpt

(ProQuest: ... denotes formulae omitted.)

WITH the innovations of biomedical and biochemical technologies, large amounts of genetic sequencing data have been produced, providing researchers with great opportunities to investigate the genetic contributions to some phenotypes such as cancers. Genome-wide association studies (GWASs) have successfully identified thousands of single nucleotide polymorphisms (SNPs) that are associated with some common diseases (Manolio et al. 2009; Chen and Ng 2012; Chen 2013; Chen et al. 2017b). However, most of those identified SNPs from GWAS are variants with relatively high minor allele frequencies (MAFs). Rare variants (e.g., SNPs with MAF <5%) may play a critical role in disease development (Bodmer and Bonilla 2008). Nevertheless, because of their low MAFs, rare variants are usually removed from data analysis in GWASs. And, even if they were included, current statistical methods designed for GWASs may have very limited power to detect the signal if the sample sizes are not large enough. Instead of testing a single variant a time, researchers have proposed statistical approaches to detecting the possible association between a set of variants and a phenotype. Recently, many statistical methods have been designed specifically for gene-set or pathway rare-variant data analysis (Li and Leal 2008; Madsen and Browning 2009; Han and Pan 2010; Basu and Pan 2011; Lin and Tang 2011; Wu et al. 2011, 2015; Yi and Zhi 2011; Lee etai. 2012; Sha etai. 2012; Pan etai. 2014; Wang 2016; Chen et al. 2017a; Chen and Wang 2017).

The sequencing kernel association test (SKAT) is among the most popular rare-variant association testing methods. The SKAT is essentially based on the principal component analysis (PCA). More specifically, it calculates a test statistic from each individual principal component of the covariance matrix of the genotype data, and then takes the weighted sum of these statistics as the overall test statistic, where the weights are the associated eigenvalues. The null distribution of the overall test statistic is a linear combination of chi-square distributions, which can be approximated by a chi-square distribution (Davies 1980; Liu etal. 2009), from which a P-value can be approximated.

The optimal sequencing kernel association test (SKAT-O) is a weighted sum of the SKAT and a burden test, which assumes the directions are the same and the magnitudes are similar among all of the rare variants under study (Lee et al. 2012). Therefore, the SKAT-O in general is more robust than the SKAT. However, like the SKAT, the SKAT-O still uses the information from eigenvalues. In addition, both the SKAT and the SKAT-O require assigning weigh to each variant (e.g., a function of MAF).

The use of the eigenvalues as weights in the SKAT can be beneficial if indeed the major principal components have stronger association with the phenotype. However, if this assumption is not met, the SKAT can potentially lose power dramatically. In addition, assigning weights to variants can be challenging. To circumvent these difficulties, in this paper, we propose a new statistical association testing method for rare-variant data analysis. This new test has some nice properties, such as simple form and computational efficiency. To study the performance of the proposed approach, we compare it with some popular methods. Our comparison results show that the new test is more powerful than the SKAT and SKAT-O tests under most of the situations studied. Real data applications are also given to illustrate the use of the new approach.


We use y =(yi,y2, ",yn)' to denote phenotypes (either qualitative or quantitative) of the n subjects in a study. Assume Xn x p are the observations of p covariates from n subjects, and Gn x k are the k genotypes from n subjects, where the (ij) component of Gn x k, g;j = 0,1, or 2 if the number of copies of the minor allele of the jth SNP from the ith subject is zero, one, or two, respectively. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.