A Novel Access Control Strategy for Distributed Data Systems

Article excerpt

1. Introduction

Implementing distributed application systems, the data model and access controls have to ensure the distribution is reliable, scalable, and secure enough. It is extremely difficult for relational database to design the distributed data systems. Most of large scale storage systems are non-relational key/value databases, such as eBay Odyssey [1], Yahoo PNUTShell [2], Google BigTable [3, 4], Amazon Dynamo [5], and Amazon SimpleDB. In these systems, scalability, consistency, availability and partition tolerance properties are commonly desired. In addition to all the properties, a high-quality distributed data systems must take security into account, especially in cloud computing [6-11]. But enhancing system security usually weakens significantly its scalability and openness. To balance between security and scalability/openness, many access control approaches are proposed. Sandhu et al [12] presented the RBAC96 model, in which the concept of hierarchy is introduced into roles. And organization structures are also formed into role hierarchies. However, organization structures are not suitable to be implemented directly as roles, since these are naturally administrative domains. Oh et al [13,14] introduced the ARBAC02 model. In their model, organization structures are used to define user and permission pools with a refined prerequisite condition specification. So they are independent of roles and role hierarchies. However, Gilbert and Lynch [15] proved theoretically that it is impossible to achieve all the properties in current distributed systems. Pritchett [16] proposed the BASE (Basically Available, Soft state, Eventually consistent) model, which is diametrically opposed to ACID (Atomicity, Consistency, Isolation, Durability). Instead of standard SQL, private API (Application Programming Interface) was the main interfaces. The model hardly provides complex queries, integrity constraints and joins, all of which have to be completed through complex programming.

In this paper, we introduce a data distribution model, which is based on data multitrees, namely semantic clusters of relational data. And the access control strategy of the data model is discussed in detail. In our data distribution model, the database schema is expressed as

a schema graph, and a database instance is imagined as a data graph. Tuples are nodes in the data graph, and references between these tuples are directed edges. We introduce a generalized tree structure called as multitree [17, 18], in which a node can have many parent nodes. It is different from the traditional hierarchical data model, since the traditional hierarchical data model suffers from the limitation of only single root. All the data graphs or schema are transformed into multitrees. If circuits and diamonds [17] can be reduced or removed from the database schema graphs, the produced data graphs are data multitrees. Even if schema graphs and data graphs contain circles or diamonds, both are also globally imagined as multitrees. Since the granularity of multitrees is coarser than that of fragments, the complexity of distribution is decreased significantly. In the multitree model, each user has a maximum access range corresponding to its multitree. So it is integrated naturally with security. Our access control is refined into two parts, namely operation access control and data access control. The multitree model and its access control strategy are helpful to broken through the dilemma between scalability and security control.

The rest of the paper is organized as follows. Related works about the data distribution model and access control strategy

are reviewed in Section 2. In Section 3, we introduce a multitree model for distributed data systems. And its access control strategy is presented in Section 4. The performance of our approach is evaluated in Section 5. Finally conclusions and some future works are given in Section 6.

2. …