Academic journal article Environmental Health Perspectives

Data Explosion: Bringing Order to Chaos with Bioinformatics. (Data Explosion)

Academic journal article Environmental Health Perspectives

Data Explosion: Bringing Order to Chaos with Bioinformatics. (Data Explosion)

Article excerpt

Scientists say a clearer understanding of gene-toxicant interactions will provide significant new opportunities for protecting public health. But there's a catch: these toxicogenomics promises lie hidden in mountains of data.

Thanks to technology advances, the nucleotide sequences that make up DNA, in addition to the amino acid sequences that make up proteins, are collected with robotic automation and stored by the millions in vast, expanding databases throughout the world. Microarrays, which provide snapshots of thousands of expressed genes simultaneously, are also data-intensive. Years ago, when sequencing was slow and tedious, scientists could study the output manually--no more. By necessity, they now need computers and sophisticated algorithms to wade through it all.

In recent years, the field of bioinformatics has emerged to meet these challenges. By definition, bioinformatics is the process by which informatics--the science of turning data into information--is applied to biology. A combination of computer science, information technology, and molecular biology, bioinformatics allows researchers to quickly access and interpret a rising tide of genomic information. This is critical for the genomic era: scientists are sequencing the genomes of many species, but they know little about how great regions of these genomes and the proteins they give rise to actually function.

In a basic application, bioinformatics allows researchers to search online databases such as GenBank for a given gene's composition, proteins, mutations, coverage in the scientific literature, and many other relevant parameters that are collectively termed "annotation." With more advanced applications, scientists use bioinformatics techniques to model chemical networks in living cells, including those stressed by disease or toxicity.

No researcher can possibly be familiar with all the known interactions in a cell, says Trey Ideker, a computational biologist with the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts. Bioinformatics allows scientists to access, display, and interpret systems-level information. Fueled by bioinformatics, toxicogenomics is becoming an in silico science, with computerized data mining a key source of new discoveries.

Core Repositories

The rise of modern bioinformatics is rooted in the history of protein and nucleotide sequencing. The timeline arguably dates back to 1955, the year a Nobel Prize-winning British biochemist named Frederick Sanger first sequenced the protein bovine insulin. The first completed genome, sequenced in 1980, was that of a virus called phiX174. In subsequent years, scientists have gone on to sequence the genomes of higher organisms, including the human genome, which was completed in April 2003.

At first, sequencing was a slow and tedious process. The traditional technique--which involved gel electrophoresis and autoradiography--allowed scientists to manually sequence a single DNA fragment of 300-500 base pairs in about a day. This technique has been replaced almost entirely by automated high-throughput technologies to process DNA samples to determine the arrangement of nucleotides. The Applied Biosystems sequencers used in the decoding of the human genome, for example, are roughly 6,000 times faster than earlier approaches.

Today, sequencing is an international phenomenon. Entire consortia are devoted to sequencing the genomes of many species, including the human, the rat, the mouse, and many types of fish, birds, and microbes. Most of these sequences eventually wind up in a few publicly available databases. For nucleotides, the chief database in the United States is GenBank, maintained by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine of the NIH. GenBank was actually started by the late physicist Walter Goad of the Los Alamos National Laboratory, who began compiling sequences there in 1979 while initiating efforts to create a national DNA/RNA database. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.