Academic journal article Journal of Digital Information Management

Improving Data Management in a Distributed Environment

Academic journal article Journal of Digital Information Management

Improving Data Management in a Distributed Environment

Article excerpt

1. Introduction

Data allocation is a key performance factor for distributed database and data warehouse systems. However, finding optimal solutions for data allocation in a distributed environment is a difficult problem to deal with. This is mainly because many allocation design factors are considered for optimal data distribution. Assuming that the database is properly fragmented, the designer has to decide on the optimal allocation of the fragments to various sites on the network and determine which copy or copies of data to access, where to process and how to route the data.

Typically, users at each site or node have their own set of information requirements. Some of these involve data that is unique to users at a single node. Others require data that is shared among users at multiple nodes. However, to satisfy a user request in a distributed environment, you need to determine where the needed data is located and a strategy that specifies which copy of the data to be accessed and where it will be processed should be identified.

2. Related Work

A distributed database comprises a set of fragments of databases stored at multiple sites that work together and appear as a single database to the user. Each database server in the distributed database is controlled by its local database management system. The objective of data distribution is to meet the information needs of business organizations having different sites with one or more computer systems connected via some communications network.

In a distributed warehouse database system the allocation of data over different sites or nodes of the network is a critical aspect of database design effort. A poor distribution can lead to higher loads and hence higher costs in the nodes or in the communication network, so that the system cannot handle the required set of transactions efficiently.

Fragments allocation problem has been extensively studied in both static and dynamic environments. In a static environment where the access probabilities of nodes to the fragments never change, a static allocation has been proposed prior to the design of a database depending on some static data access patterns. However, in a dynamic environment where these probabilities change over time, the static allocation solution would degrade the database performance. Initial studies on dynamic data allocation give a framework for data redistribution and demonstrate how to perform the redistribution process in a minimum possible time. In [4] a dynamic data allocation algorithm for non-replicated database systems is proposed named optimal algorithm, but no modelling is done to analyze the algorithm. In [14] the threshold algorithm is proposed for dynamic data allocation algorithm which reallocates data with respect to changing data access patterns with special focus on load balancing issues.

Many authors have considered various aspects of the allocation problem, in a variety of contexts. For example, [12] incorporate security considerations into the fragment allocation process [13] consider allocation in the context of multidimensional databases [11] present an algorithm for allocation and replication that adapts to the changing patterns of online requests [2] consider incorporating partitioning into an automatic design framework, and [6] considers incremental allocation and reallocation based on changes in workload.

[15] consider the related problem of distributing the documents of a Web site among the server nodes of a geographically distributed Web server. The problem of replica placement is considered in [7] for networks using a read-one-write-all policy, and in [10] for wide-area systems, while [Buchholz and Buchholz 04] consider it in the context of content delivery networks.

Various approaches have already been adopted to solve the data allocation problem in distributed systems [8], [6], [5]. Some approaches are limited in their theoretical and implementation parts [3], [9]. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.