AI and Bioinformatics
Glasgow, Janice, Jurisica, Igor, Rost, Burkhard, AI Magazine
* This article is an editorial introduction to the research discipline of bioinformatics and to the articles in this special issue. In particular, we address the issue of how techniques from AI can be applied to many of the open and complex problems of modern-day molecular biology.
This special issue of AI Magazine focuses on some areas of research in bioinformatics that have benefited from applying AI techniques. Undoubtedly, bioinformatics is a truly interdisciplinary field: Although some researchers continuously affect wet labs in life science through collaborations or provision of tools, others are rooted in the theory departments of exact sciences (physics, chemistry, or engineering) or computer sciences. This wide variety creates many different perspectives and terminologies. One result of this Babel of languages is that there is no single definition for what the subject of this young field really is. Even the name of the field varies: Bioinformatics, theoretical biology, biocomputing, or computational biology are just a few of the terms used. In fact, this lack of a precise definition is not of the type, "I recognize it when I see it"; rather, different representatives of the field have fairly different ideas about what it actually is.
Here, we do not attempt to impose any specific definition of the field. The particular collection of reviews presented constitutes a sparse sampling from the broad activities in the area. Larry Hunter ("Life and Its Molecules: A Brief Introduction") describes some of the concepts and terms prevalent in today's molecular biology. If you find the plethora of technical terms overwhelming, be assured that modern-day biology is far more complex than suggested by the simplified sketch presented here. In fact, researchers in life sciences live off the introduction of new concepts; the discovery of exceptions; and the addition of details that usually complicate, rather than simplify, the overall understanding of the field.
Possibly the most rapidly growing area of recent activity in bioinformatics is the analysis of microarray data. The article by Michael Molla, Michael Waddell, David Page, and Jude Shavlik ("Using Machine Learning to Design and Interpret Gene-Expression Microarrays") introduces some background information and provides a comprehensive description of how techniques from machine learning can be used to help understand this high-dimensional and prolific gene-expression data. The authors point out that it is natural to apply machine learning to such data, but it is also challenging because of its complexity.
The term protein function is not well defined; it encompasses a wide spectrum of biological contexts in which proteins contribute to making an organism live. (Note that the term gene function is somehow a misnomer in the sense that it means "the function of the protein encoded by a particular gene.") This intrinsic complexity of terminology makes it extremely difficult to build databases with controlled vocabularies for function. Furthermore, the vast majority of experimental data is buried in free-text publications. Mining free text, such as MEDLINE abstracts and machine learning interpretations of controlled vocabularies, constitutes another area of increasing activity. Rajesh Nair and Burkhard Rost ("Annotating Protein Function through Lexical Analysis") review a few of the recent methods that have begun influencing experimental research. They observe that to date the technically simplest tools appear to be the most successful ones and that the seemingly most simple problem--identifying the gene-protein name from a publication--constitutes one of the major bottlenecks in incorporating free-text mining systems into everyday MEDLINE searches. Ross King ("Applying Inductive Logic Programming to Functional Genomics") reviews applications of inductive logic programming that address the problem of predicting some aspects of protein function. …