Academic journal article Genetics

Gene Network Inference Via Structural Equation Modeling in Genetical Genomics Experiments

Academic journal article Genetics

Gene Network Inference Via Structural Equation Modeling in Genetical Genomics Experiments

Article excerpt

ABSTRACT

Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.

(ProQuest: ... denotes formulae omitted.)

SYSTEM biologists are interested in understanding how DNA, RNA, proteins, and metabolites work together as a complex functional network. Projecting this network onto the gene space (Brazhnik et al. 2002) yields a gene network, where only the relationships between genes are modeled, although the physical interactions between genes are mediated through other components. While networks including genes, RNA, proteins, and metabolites would be more informative, gene networks are system-level descriptions of cellular physiology and provide an understanding of the genetic architecture of complex traits and diseases.

Bayesian networks are currently a popular tool for gene network inference (Friedman et al. 2000; Pe'er et al. 2001;Hartemink et al. 2002; Imoto et al. 2002; Yoo et al. 2002). Bayesian networks use partially directed graphical models to represent conditional independence relationships among variables of interest and are suitable for learning from noisy data (e.g., microarray data) (Friedman et al. 2000). Bayesian networks are directed acyclic graphical (DAG) models, which cannot represent structures with cyclic relationships. However, gene networks reconstructed on the basis of genetical genomics (or other perturbation) experiments are expected and have been found to be cyclic. Gene networks are phenomenological networks whose edges represent causal influences. These can be physical binding of a transcriptional regulator to the target promoter or more complicated biochemical mechanisms (involving signal transduction and metabolism), as there is much genetic regulation beyond transcription factors (Brazhnik et al. 2002). Recent articles point to the need for methods that can infer cyclic networks, note the limitation of the Bayesian network approach (Lum et al. 2006), and show better performance of a linear regression method over a Bayesian network algorithm most likely due to the presence of cycles (Faith et al. 2007). An alternative approach to the reconstruction of directed cyclic networks (directed cyclic graphs, DCGs) is based on the assumption that a cyclic graph represents a dynamic system at equilibrium (Fisher 1970) and includes a time dimension to produce a causal graph without cycles (DAG), which then can be studied using Bayesian networks, an approach called dynamic Bayesian networks (Murphy and Mian 1999; Hartemink et al. 2002). However, this approach requires the collection of time series data, which is dif- ficult to accomplish, as it requires synchronization of cells and close time intervals not allowing for feedback (Spirtes et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.