Data-Flow Modeling: A Survey of Issues and Approaches

Data-Flow Modeling: A Survey of Issues and Approaches

This paper presents a survey of previous research on modeling the data flow perspective of business processes. When it comes to modeling and analyzing business process models the current research focuses on control flow modeling (i.e. the activities of the process) and very little attention is paid to the data-flow perspective. But data is essential in a process. In order to execute a workflow, the tasks need data. Without data or without data available on time, the control flow cannot be executed. For some time, various researchers tried to investigate the data flow perspective of process models or to combine the control and data flow in one model. This paper surveys those approaches. We conclude that there is no model showing a clear data flow perspective focusing on how data changes during a process execution. The literature offers some similar approaches ranging from data modeling using elements from relational database domain, going through process model verification and ending with elements related to Web Services.

Keywords: Data Flow, Process, Workflow

1 Introduction

Nowadays, more and more information systems are process-centric. That is, the trend is to create and configure information systems focused on processes. But it wasn't like this all the time. The 1970s and 1980s were flooded by data-driven approaches [1]. This led to the development of data-centric information systems. Then the trend shifted and the era of user interface centric applications began. Process driven approaches appeared at the beginning of 1990s. Since then, the focus was on analyzing the control flow view of the process, to evaluate the order of tasks during a workflow execution, or to model and extract models from system logs. This type of analysis did not involve data or if it did, data was only reminded as being part of the control flow without being analyzed in detail. Considering this, we argue that data flow perspective is outside the aim of most workflow/process researchers and that they focus on analyzing the control-flow perspective [2], [3], [4], [5], [6], [7], [8]. Existing data mining techniques are too data-centric [4], while the research made in the process mining field emphasizes the control-flow perspective. This paper tries to argue that a new approach that balances between those extremes is needed. This is because running a process (i.e. executing the control flow) requires tasks to be enabled or disabled and this is done at the data level. If data is missing or is not available when is needed, the entire execution of the workflow ends. Some data is available at the beginning of the workflow, but there is also data which is generated during the execution of the workflow, after a specific task is executed. Therefore, we are dealing with input and output data elements. Each activity is characterized by a set of input data elements, respectively a set of output data elements.

Real data from enterprises is represented in an abstract way and stored in databases (see Figure 1). Entity Relationship Diagram (ERD) offers an abstract view of data in order to depict a database. An ERD is created based on data and uses abstract concepts. On the other hand, the analysis of system log data using process mining techniques and methods enables a process model to be automatically extracted. Based on the process model, modeling experts may improve the model with the data flow perspective. Important sources of knowledge for an expert are the ERD concepts and elements from the software design documentation. Once the data flow is modeled, this needs to be verified if it is conformant with the analyzed data.

2 Related Work

The literature offers different ways to model the dataflow perspective of a process, but: a) none of them provides a step by step view over the data transformation during the workflow execution or b) not all approaches refer on discovering data model from event logs. This section reviews all previous research that approached the problem of modeling data flows and, more specifically, the process data perspective. …

