OGSA-WebDB: An OGSA-Based System for Bringing Web Databases into the Grid

OGSA-WebDB: An OGSA-Based System for Bringing Web Databases into the Grid

Abstract: An OGSA-based system, OGSA-WebDB, is proposed, which enables grid applications to query web databases using a standard database query language (SQL). OGSA-WebDB consists of two main components: proxy databases and a mediator. Proxy databases represent the desired web databases while the mediator acts as an interface between the proxy databases and the web databases. The mediator accepts an SQL query from local applications and transforms the query into one or more Boolean conditions that are then sent to the target web databases. The mediator processes the SQL query in parallel, taking into account the characteristics of the web databases. Experimental results revealed that the query processing time is so small that it can be ignored for timing considerations. The system has been fully implemented on top of Globus Toolkit and OGSA-DAI software components.

Key words: Web databases, grid technology, grid security Architecture

1. Introduction

Grid computing technology [23, 25] has been widely used in many scientific and commercial applications. Many of these applications use database management systems (DBMSs) to store and manage important datasets. There is an urgent, widespread need to interconnect pre-existing, independently operated databases in the grid environments [31, 15]. This has resulted in the formation of the Database Access and Integration ServicesWorking Group (DAIS-WG) [4] in the Global Grid Forum (GGF).

Very often scientists and other users need to access and integrate information from multiple sources in order to obtain the information they desire. This may include accessing information from data sources available online on the web (called web databases). For example, a biologist designing a new drug might wish to access information from a drug patent database [21] or a biomedical literature citation database [11] and integrate it with information from local DNA databases.

In general, many web databases cannot be accessed directly using local database drivers or be queried using a specific database language, such as SOL. Instead, they must be accessed using HTTP GET or POST requests via web search interfaces and queried using keywords combined with Boolean operators. Web database owners usually do not make the metadata of their databases available.

The current implementation of specification [18, 19, 8] assumes that grid applications and grid database components can query databases using database languages, and that the databases can be accessed using specific drivers, such as JDBC. In addition, this specification requires that the database owners cooperate by exposing their database metadata. Clearly, the enforcement of these grid-database requirements will limit access to valuable information on the web by many potential grid users or applications.

One way to enable access to web databases from within a grid is to use a mediator [33] to serve as an interface between these two systems. In order to provide an effective interface, the mediator must satisfy the following grid-specific requirements:

1. It must provide an interface between the grid and web databases that allows web databases to appear to satisfy the above grid-database requirements.

2. It must comply with grid specifications so that it can be plugged easily into any grid environment.

3. It must support the Grid Security Infrastructure (GSI) for authenticating web databases.


So far, many mediator/data integration systems have been proposed as noted in Section 7. However, most of them do not satisfy one or more of the above requirements. Therefore, they cannot be used to mediate between grid and web databases.

In this paper, an OGSA-based system, OGSA-WebDB, is proposed that enables access to web databases and also provides for the integration of web databases from within the grid. The system is designed such that it satisfies the above grid-specific requirements. …

