Academic journal article Environmental Health Perspectives

Laying a Community-Based Foundation for Data-Driven Semantic Standards in Environmental Health Sciences

Academic journal article Environmental Health Perspectives

Laying a Community-Based Foundation for Data-Driven Semantic Standards in Environmental Health Sciences

Article excerpt

Introduction

This review is derived from a workshop held at North Carolina State University, Raleigh, North Carolina, USA, on 15-16 September 2014. Sharing, analysis and integration of environmental health science (EHS) data is limited by a lack of data standards, in particular, common language standards. Language standards are shared vocabularies that are used for data annotation and common data elements specification to aid interoperability. They may be as complex as an ontology, whereby the terms and the relations between them are defined using logic and are expressed in computable languages such as the Web Ontology Language (OWL 2016), or, they may be as simple as a hierarchical vocabulary. This workshop aimed to a) articulate research areas that would be advanced by EHS language standards and data interoperability, b) identify a community to initiate the creation and champion the extension of EHS language standards, and c) develop guidelines for the development of EHS standards.

Exposure to environmental factors significantly impacts human health. The environment, broadly defined, can range from everyday products (e.g., toothpaste) to hazardous materials (e.g., open pit mining sites) and socioeconomic stressors. Consideration of this spectrum is needed to better understand how, when, and to whom exposures pose health risks. There is an enormity of available data that, if structured and integrated, could be leveraged to inform mechanistic hypotheses, therapeutic approaches, and policy making. However, a lack of semantic standards has been a major barrier to data sharing and integration (van Panhuis et al. 2014). This need for semantic standards is being recognized in many areas of biomedical research. For example, the National Research Council's report titled "Toward Precision Medicine" called for clinical and research advancements based upon systems that would be enabled by a new language standard (NRC 2011). The authors of this report--Committee on a Framework for Developing a New Taxonomy of Disease, Board on Life Sciences, and Division on Earth and Life Studies--determined that "The rise of data-intensive biology, advances in information technology, and changes in the way health care is delivered have created a compelling opportunity to improve the diagnosis and treatment of disease by developing a Knowledge Network, and associated New Taxonomy, that would integrate biological, patient, and outcomes data on a scale hitherto beyond our reach" (NRC 2011).

Development of semantic standards, such as logically constructed ontologies, EHS data, and integration of this effort within the broader biomedical context through crosscutting research programs, such as the Exposome (Wild 2005) and Big Data to Knowledge (BD2K) (Margolis et al. 2014), will enhance the capacity to inform disease research with environmental data while also improving understanding of environmental impacts on human disease. The lack of language standards and their consistent implementation affects not only the capacity to analyze across diverse data sets, but even hinders the ability to identify available data sets, limiting the value of potentially important scientific findings. A query of microbiome samples using PubMed from the National Center for Biotechnology Information (NCBI 2016) illustrates the variability in results that stem from a lack of harmonized language standards and annotation of data using such standards (Table 1). Standardization has the potential to benefit many areas of biomedical science by augmenting discovery and reuse (Richesson and Nadkarni 2011; Tenopir et al. 2015; Zimmerman 2008).

A few projects have specifically demonstrated the potential of adopting standards to advance EHS data integration, research, and discovery. For example, the Oceans and Human Health program [supported by the National Institute of Environmental Health Sciences (NIEHS) and the National Science Foundation] links oceanographic and metagenomics data sets (NCBI's Sequence Read Archive, Metagenomic Rapid Annotations using Subsystems Technology) (NCBI-SRA 2015; Youngblood et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.