Academic journal article Journal of Digital Information Management

Hybrid Storage Scheme for RDF Data Management in Semantic Web

Academic journal article Journal of Digital Information Management

Hybrid Storage Scheme for RDF Data Management in Semantic Web

Article excerpt

ABSTRACT. With the advent of the Semantic Web as the next-generation of Web technology, large volumes of Semantic Web data described in RDF will appear in the near future. Most previous approaches treat RDF data as a form of triple and store them in a large-sized relational table. Basically, since it always requires the whole table to be scanned for processing a query, it may degrade retrieval performance. In addition, it does not scale well. In this paper, we propose a hybrid storage approach for RDF data management. The proposed approach aims to provide good query performance, scalability, manageability, and flexibility. To achieve these goals, we distinguish some frequently appeared properties in RDF data. A set of RDF data with a distinguished property is independently treated and stored together in a corresponding property-based table. For processing a query having a specific property, we can avoid full scanning the whole data and only have to access a corresponding table. For queries having specific properties, the proposed scheme achieves better performance than the previous approach.

Categories and Subject Descriptors

B.4 [Input/Output and Data Communications]; D.2.12 [Interoperability Web-based Services]; E.2 [Data Storage Representations]; H.2 [Database Management]

General Terms

Hybrid data storage, W3C, Web data management

Keywords: RDF Data management, Semantic web, Data storage scheme

1. Introduction

The W3C has established the Semantic Web as the next-generation Web. The Semantic Web extends the current Web to make Web information meaningful to computers by giving it a well-defined meaning, which is so called semantics. This semantic data attached to Web information is the foundation in the Semantic Web. The W3C released, therefore, the Resource Description Framework (RDF) to represent and exchange semantic data about resources in the Web [1]. We call these data Semantic Web data or more concisely RDF data in this paper.

As it is expected that the utilization scope of the Semantic Web application will be more expanded, enormous Semantic Web data will appear in the near future. For example, MusicBrainz is one of the first of what might be called Semantic Web services [12]. It provides information about musical artists, song titles, and so on using metadata described in RDF. Thus, we strongly believe that how to efficiently store and manage the Semantic Web data is a key role in realizing the vision of the Semantic Web.

In order to manage RDF data, most previous approaches use traditional database management systems such as RDBMS and ORDBMS [2][3][4][14]. In these approaches, RDF data is represented by a set of triples and then stored in a single large relational table (what is called a triple table). From a data management view point, it has the advantage of directly using the full power of databases management systems. Basically, since it always requires the whole table to be scanned for processing a query, however, it may degrade retrieval performance. In addition, maintaining a single large triple table is not good for scalability.

Recently Ding et al [8] reported an analysis on the empirical usage of properties over FOAF (Friend-of-a-Friend) data and revealed the most frequently used properties. We focused on the fact that among whole properties in FOAF vocabulary, the average total usage of several properties (about 5) shows over 50% of the whole usage. We believe that since the most frequently used properties will be continuously and popularly used both in generating future FOAF documents and in forming user query, it is more efficient to manage them with a special manner.

In order to enhance query performance, in this paper, we propose a novel storage scheme for managing RDF data. We also aim to provide scalability, manageability, and flexibility. We maintain RDF data not in a single large table but in several independent tables. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.