Academic journal article Journal of Information Systems Education

An Experimental Investigation of Complexity in Database Query Formulation Tasks

Academic journal article Journal of Information Systems Education

An Experimental Investigation of Complexity in Database Query Formulation Tasks

Article excerpt

1. INTRODUCTION

Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. While there are many graphical user interface tools that allow end-users to summarize and view organizational data, structured query language (SQL) is still the standard programming language for formulating ad hoc queries against relational databases (Allen & March, 2006). Query formulation with SQL is a skill that is in high demand and is taught in most introductory database courses. Query formulation can be a complex task because it often includes a high degree of requirements uncertainty (e.g., ambiguity in the request for information), multiple solution paths that produce the correct result, and a high degree of information overload when working with large data models (Bowen et al., 2009; Ashkanasy et al., 2007; Borthick et al., 2001; Campbell 1988).

In this study, we investigate two factors that impact query writing performance--the ambiguity in the information request and the complexity of the target solution. We examine performance in terms of the accuracy of the query solution, the time taken to produce the solution, and the writer's confidence in the quality of his solution. The purpose of the study is to confirm the main effects of ambiguity and solution complexity on performance (as in Borthick et al., 2001) and to evaluate the interaction effects of ambiguity and complexity on performance. Our goal is to use these findings to better understand why some queries are more difficult to formulate than others, and to identify potential teaching strategies and techniques to facilitate students' acquisition of SQL skills.

2. PRIOR RESEARCH ON QUERY FORMULATION

Reisner's (1981) classic model of the query formulation process is shown in Figure 1. According to this model, the query writer is given an information request (e.g., "Find the salary of Smith's manager") and generates a mental "query template" of an SQL SELECT statement. The template specifies the structural foundation for the query. The query writer then maps elements from the information request into SQL components that can be inserted into the appropriate "slots" of the template. The mapping involves three transformational activities: (1) replacing words from the information request with elements from the data model (e.g., replacing the "salary" with the column SAL), (2) adding elements to the SELECT statement beyond what is in the information request (e.g., "Smith's manager" [right arrow] NAME = (SELECT MGR WHERE NAME = 'Smith'), and (3) ignoring terms from the information request that are not needed in the SELECT statement.

This model of template-generation-plus-mapping provides a reasonable starting point for understanding the process of query formulation and two sources of complexity in query formulation tasks--structural complexity and (lexical) transformational complexity (Reisner, 1977). Structural complexity addresses questions about the query template, such as whether the FROM clause specifies an inner or outer join, or whether a WHERE, GROUP BY, or HAVING clause is needed.

Transformational complexity stems from the complexity in the lexical mappings shown in Figure 1--the replacing, adding, and omitting of lexical elements from the natural language request to fill in the "slots" in the query template. Transformational complexity increases as the "gap" between terms in the information request and elements in the data model increases, and is influenced by the degree of ambiguity in the information request (Borthick et al., 2001). For example, a request such as, "Which customers placed online orders over $3,000 last July?" could also be worded more precisely as, "List the customer's name and account number, if the customer placed an order between July 1, 2012 and July 30, 2012 with an order total greater than 3000 and an online order flag equal to 1. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.