Academic journal article Informatica Economica

From Natural Language Text to Visual Models: A Survey of Issues and Approaches

Academic journal article Informatica Economica

From Natural Language Text to Visual Models: A Survey of Issues and Approaches

Article excerpt

1 Introduction

Organizations focus on automate their processes in order to improve efficiency, reduce costs, and/or reduce human beings' errors in an easy and rapid manner. Business process management (BPM) methods provide a solution to this issue. In this context, information systems like CRM, ERP, SCM, etc. have known an increasing demand. The main problem consists of the length of the business process specifications. If new regulations appear, these specifications must be adapted. Manually extraction of visual models is time consuming. During time, a series of solutions were proposed. The literature shows a crowd of approaches that extracts data models [1], [2], [3], and process models [4], [5] from Natural Language (NL) text. This paper aims to analyze the Natural Language Processing (NLP) techniques and tools used in order to provide different types of visual representations.

On the last few years, approaches based on NLP have been developed in order to automate process conversion from NL text. NLP plays an important role in NL text analysis as NLP tries to understand speech and text as humans beings would do. Colloquialism, abbreviations or typos make this task a challenging one. NLP has the origins in 1950s when Alan Turing proposed the Turing test [6], by introducing the imitation game. Since then, the literature shows a plethora of NLP tools [7], [8], [9], [10] using several machine learning techniques, our focus being on those applied on data models and process models discovery from NL text, starting with the first language parser [11] to the actual ones like NLTk [7], [8], ANTLR1, etc.

Linguistic analysis is closely tied to NLP. Liddy [12] highlights 7 levels of linguistic analysis: a) Phonetic or Phonological level: how words are pronounced, b) Morphological level: prefixes, suffixes and roots analysis, c) Lexical level: word level analysis including lexical meaning and Part-Of-Speech (POS) analysis, d) Syntactic level: grammatical analysis of words in a sentence, e) Semantic level: determining the possible meanings of sentences, f) Discourse level: interpreting structure and meaning for texts larger than a sentence, g) Pragmatic level: understanding the purpose of a language.

Some of the problems approached by NLP are: POS tagging, parsing, Named Entity Recognition (NER), chunking, Semantic Role Labeling (SRL). Anaphora resolution [13] refers to the interpretation of the link between the anaphor and its antecedents.

The remainder of the paper is organized as follows: Section 2 briefly outlines the NLP domain, describing the levels of linguistic analysis and the main NLP approaches of analyzing NL requirements. Section 3 focuses on data and process models extraction from NL text. This section makes an introduction to Object Oriented Analysis and Business Process Modeling and analyses the existing tools that discover data and process models from text. Subsequently, Section 4 summarizes the results of this work and the conclusions are drawn in Section 5.


A detailed review on NLP is given in [14] and in [15]. Jones [14] divides the history of NLP into four phases: the first starts at the beginning of the 1940s and lasts to the late 1960s, the second begins from the end of 60s and lasts to the end of 70s, the third is represented by late 80s, where the fourth phase starts in the late of 90s. Next, we will detail each phase as they were defined in [14] and [15]. First phase treated machine translation issues, while the second focused on artificial intelligence. The third phase can be called grammatico-logical phase, which is followed by the lexical phase. A fifth phase is proposed in [16] where formal theories and statistical data are combined. Since this study was published first in 1994 and then it was re-organized in 2001 we can add the sixth phase: from 2000 until present where NLP techniques are combined in order to contribute to visual models extraction.

Software requirements are usually written in NL which is asymmetric and irregular [17]. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.