Academic journal article Genetics

High-Confidence Discovery of Genetic Network Regulators in Expression Quantitative Trait Loci Data

Academic journal article Genetics

High-Confidence Discovery of Genetic Network Regulators in Expression Quantitative Trait Loci Data

Article excerpt

ABSTRACT

Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL "hotspot," and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such an architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of "shielding" in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae (Brem et al. 2002). Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments.

(ProQuest: ... denotes formulae omitted.)

TECHNOLOGICAL advances in recent years have given biological researchers access to genomic, transcriptomic, proteomic, and other -omic data at an unprecedented scale. Such data sources describe genetic regulation on multiple levels, and mining this data offers hope of unraveling complex genetic networks. For instance, detailed observations about variation in gene expression as a function of natural sequence variation or variation in experimental conditions can potentially be analyzed to learn regulatory relationships among genes.

While genetic network prediction offers many benefits, there are many computational and statistical challenges associated with network prediction from such large data sets. The "large P, small N" problem is compounded in network prediction because the number of variables increases from P to P(P - 1) since in principle directed edges can exist between any pair of variables. Not only are there computational difficulties associated with searching among the large space of possible networks, but also there are statistical challenges associated with being able to infer the correct network from the large space of networks with a limited sample size.

Even with such seemingly insurmountable challenges, various researchers have proposed methods for genetic network discovery in genomic data sets. The first application of network discovery techniques to genomic data was in Friedman et al. (2000) in which Bayesian networks were used to discover network structure in a yeast cell cycle microarray gene expression data set. The authors used the "sparse candidate" algorithm for network discovery, which limits the number of possible parents for each node and thus dramatically reduces the size of the network search space. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.