Academic journal article Journal of Information Systems Education

A Notation for Planning SQL Queries

Academic journal article Journal of Information Systems Education

A Notation for Planning SQL Queries

Article excerpt

1. INTRODUCTION

When teaching programming, teachers often emphasize planning before writing, and encourage the use of various techniques, e.g., flowcharts, to plan how the software works. As the software becomes increasingly complex, planning can be supported by design, e.g., by using class diagrams. Various planning techniques that support learning have been proposed for programming (e.g., Hu, Winikoff, and Cranefield, 2012), but SQL has received less attention despite its popularity in both education and industry. The techniques intended for supporting the learning of programming cannot be utilized as is with SQL because of the declarative (i.e., a query is a description of what) and set focused (i.e., a query is difficult or impossible to divide into working subsets) nature of SQL as opposed to the imperative (i.e., a function is a description of how) and step focused (i.e., software operates line-by-line and function-by-function) nature of programming languages such as Java, C#, or Python. These differences make the use of flowcharts unsuitable for planning SQL queries.

The more complex the query is, the more strain it puts on the query writer's short-term memory (e.g., de Jong, 2010, for working memory in general; Smelcel, 1995, for working memory in SQL in particular). Additionally, Ahadi et al. (2016) found that omission errors are among the most common errors when students are learning SQL and proposed that following a systematic procedure and segmenting the question could be the solution for avoiding omission errors. Additionally, even though the syntax of SQL is relatively simple, during the query writing process, the writer must recall SQL keywords with their syntax and semantics, in addition to the database object names, namespaces, and required expressions which, according to Smelcer (1995), often causes strain on the student's short-term memory. Furthermore, Buitendijk (1988) discussed that one of the four major reasons for writing incorrect SQL queries was the complexity of the task. Our work introduces a simple and intuitive notation for planning SQL queries (NPSQ) which is not based on any existing notation. The purpose of the notation is two-fold. First, to assist the student in acquiring the big picture of more complex queries, and second, to separate logic and semantics from syntax, thus alleviating the strain on the student's short-term memory.

The notation can be utilized in any database course that involves SQL. We have used the notation in an introductory database course with approximately 250 to 350 students (depending on the year), mandatory for undergraduate students who major in information systems or computer science, who typically have no previous experience in SQL. We have taught SQL from the SQL standard's perspective as proposed by Randolph (2003). In addition to positive student feedback, several industry professionals have indicated that the notation has proven increasingly useful when planning more and more complex queries.

2. BACKGROUND

In this section, we first define key terms for this work. We then describe our perceptions on how a query writing process takes place in order to give background on what conceptions have driven the evolution of the notation.

2.1 Terminology

A data demand is a natural language representation of what data is needed to which a query writer, e.g., a student, is required to write an equivalent query in SQL. When a query is run, the database management system outputs an error message, or a result table which contains the rows that satisfy the query. A query plan is a picture drawn by a query writer using NPSQ. A query plan is drawn after reading the data demand but before writing the query.

Rows that satisfy a query can be limited in two ways: joins and expressions. To the extent of our teaching, a student can write a join in one of four methods: using the JOIN predicate, an uncorrelated subquery with IN, a correlated subquery with EXISTS, or with an explicit join condition without a subquery. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.