Academic journal article Journal of Digital Information Management

Ontology-Based Design of Bioinformatics Workflows on PROTEUS

Academic journal article Journal of Digital Information Management

Ontology-Based Design of Bioinformatics Workflows on PROTEUS

Article excerpt

Abstract. Bioinformatics is as a bridge between life science and computer science: computer algorithms are needed to face complexity of biological processes. Bioinformatics applications manage complex biological data accessing distributed heterogeneous databases, and require large computing power. After introducing bioinformatics requirements, we present the architecture of PROTEUS, a Grid-based Problem Solving Environment that integrates ontology and workflow approaches to enhance composition and execution of bioinformatics application on the Grid. A first distributed implementation of PROTEUS on Globus is also described.

Keywords: Bioinformatics, Problem Solving Environments, Ontology, Workflow, Grid

1. Introduction

Research in biological and medical areas, (also known as biomedicine), is always more accurate and requires high performance computer systems and sophisticated software tools to treat large volumes of complex data. Bioinformatics research involves an increasing number of computer scientists designing new algorithms and computational platforms to provide modelling and computing power to biomedical research. Data structures and software tools have been designed to support biomedicine in decoding the entire human genetic information, also known as genome. Today the new challenge is studying the proteome, i.e. the set of proteins encoded by the genome, to define models representing and analyzing the structure of the proteins contained in each cell, and eventually to prevent and cure any possible cell-mutation generating human diseases such that producing cancer-hill cells [2]. However, the high number of possible proteins, as well as the huge number of possible cell-mutations, requires a huge effort in designing software environments able to treat biomedical problems. Moreover, applications often need to access different and heterogeneous data sets, produced by experiments or obtained querying biological databases (e.g. the PDB protein database (16). Bio-informaticians are studying (biomedical-)data models, as well as specialized services and software components for managing biological data. For example, the HUPO (Human Proteome Organization) Proteomics Standards Initiative aims to define standards for data representation in proteomics to facilitate data comparison, exchange and verification [33], while systems as PEDRo introduce and implement systematic approach in proteomic research by modelling, capturing, and disseminating proteomics experimental data [13]. Grid community recognized bioinformatics as an opportunity for distributed high performance computing and collaboration applications [17]. The Life Science Grid Research Group established under the Global Grid Forum [22], believes bioinformatics requirements can be satisfied by Grid services and standards, and is interested in what new services Grids should provide to bioinformatics applications. To face these challenges, some emerging Bioinformatics Grid projects (BioGrids) are appearing. Bio-GRID is developing an access portal for bio-molecular modelling resources [14]. Asia Pacific BioGRID is building a customized, self-installing version of the Globus Toolkit [20]. Finally, myGrid is developing open source high-level Grid middleware to support data-intensive bioinformatics on the Grid [31]. Moreover, Grid community is considering workflows for designing applications [34] and ontologies to model semantics of resources and services (e.g., see [21]). We wish to provide an environment allowing biomedical researchers to search and compose bioinformatics software modules for solving biomedical problems. We focus on semantic modelling of the goals and requirements of bioinformatics applications using ontologies, and we employ workflow methodologies and tools for designing, scheduling and controlling bioinformatics applications. Such ideas are combined together using the Problem Solving Environment (PSE) software development approach [18]. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.