Applications of Case-Based Reasoning in Molecular Biology
Jurisica, Igor, Glasgow, Janice, AI Magazine
Case-based reasoning (CBR) is a computational reasoning paradigm that involves the storage and retrieval of past experiences to solve novel problems. It is an approach that is particularly relevant in scientific domains, where there is a wealth of data but often a lack of theories or general principles. This article describes several CBR systems that have been developed to carry out planning, analysis, and prediction in the domain of molecular biology.
The domain of molecular biology can be characterized by substantial amounts of complex data, many unknowns, a lack of complete theories, and rapid evolution; reasoning is often based on experience rather than general knowledge. Experts remember positive experiences for possible reuse of solutions; negative experiences are used to avoid potentially unsuccessful outcomes. Similar to other scientific domains, problem solving in molecular biology can benefit from systematic knowledge management using techniques from AI. Case-based reasoning (CBR) is particularly applicable to this problem domain because it (1) supports rich and evolvable representation of experiences--problems, solutions, and feedback; (2) provides efficient and flexible ways to retrieve these experiences; and (3) applies analogical reasoning to solve novel problems.
CBR is a paradigm that involves solving new problems by recalling old problems and their solutions and adapting these previous experiences represented as cases. A case generally comprises an input problem, an output solution, and feedback in terms of an evaluation of the solution. CBR is founded on the premise that similar problems have similar solutions. Thus, one of the primary goals of a CBR system is to find the most similar, or most relevant, cases for new input problems. The effectiveness of CBR depends on the quality and quantity of cases in a case base. In some domains, even a small number of cases provide good solutions, but in other domains, an increased number of unique cases improves problem-solving capabilities of CBR systems because there are more experiences to draw on. However, larger case bases can also decrease the efficiency of a system. The reader can find detailed descriptions of the CBR process and systems in Kolodner (1993). More recent research directions are presented in Leake (1996), and practically oriented descriptions of CBR can be found in Bergman et al. (1999) and Watson (1997).
The remainder of this article describes several CBR systems that have been developed to address problems in molecular biology. We begin with a description of a recent CBR system for planning protein-crystallization experiments, followed by summaries of earlier CBR systems for gene finding, knowledge discovery in a sequence database, and protein- structure determination. We conclude with a discussion of issues related to the application of CBR in the domain.
CBR and Protein Crystallization
One of the fundamental challenges in modern molecular biology is the elucidation and understanding of the laws by which proteins adopt their three-dimensional structure. Proteins are involved in every biochemical process that maintains life in a living organism. Through an increased understanding of protein structure, we gain insight into the functions of these important molecules. Currently, the most powerful method for protein-structure determination is single-crystal X-ray diffraction, although new breakthroughs in nuclear magnetic resonance (NMR) (Kim and Szyperski 2003) and in silico (Bysrtoff and Shao 2002) approaches are growing in their importance. A crystallography experiment begins with a well-formed crystal that ideally diffracts X-rays to high resolution. For proteins, this process is often limited by the difficulty of growing crystals suitable for diffraction, which is partially the result of the large number of parameters affecting the crystallization outcome (such as purity of proteins, intrinsic physicochemical, biochemical, biophysical, and biological parameters) and the unknown correlations between the variation of a parameter and the propensity for a given macromolecule to crystallize. …