Academic journal article Issues in Informing Science & Information Technology

Changing Paradigms of Technical Skills for Data Engineers

Academic journal article Issues in Informing Science & Information Technology

Changing Paradigms of Technical Skills for Data Engineers

Article excerpt


The Technical Committee on Data Engineering (TCDE, 2018) of the IEEE Computer Society focuses on the variety topics that include data design, data development, data management, and utilization of information systems. Data topics can range from data security, databases, cloud computing, data models, data integration, and data quality. There are peer-reviewed, international open-access journals that focus on Data Engineering. The Data Science and Engineering (DSE, 2018) journal focuses on four main areas: 1) big data, 2) information extraction from big data, 3) theory behind processing large volumes of data, and 4) big data analytics. A decade ago, Data Engineers relied heavily on the technology of Relational Database Management Systems (RDBMS). For example, Grisham, Krasner, and Perry D. (2006) described an Empirical Software Engineering Lab (ESEL) that introduced Relational Database concepts to students with hands-on learning that they called "Data Engineering Education with Real-World Projects." However, as seismic improvements occurred for the processing of large distributed datasets, big data analytics has moved into the forefront of the IT industry. As a result, the definition for Data Engineering has broadened and evolved to include newer technology that supports the distributed processing of very large amounts of data (e.g., Hadoop Ecosystem and NoSQL Databases). This paper examines the technical skills that are needed to work as a Data Engineer in today's rapidly changing technical environment. Research is presented that reviews 100 job postings for Data Engineers from Indeed (2017) during the month of July 2017 and then ranks the technical skills in order of importance. The results are compared to earlier research by Stitch (2016) that ranked the top technical skills for Data Engineers in 2016 using LinkedIn (2018) to survey 6,500 people that identified themselves as Data Engineers. Data Scientists and Data Engineers are in high demand according to a list of the 50 best jobs in America published by Glassdoor (2018). Data Scientist is ranked as the number one best job in America for 2018 and Data Engineer is ranked as the 33rd best job with a medium base salary of $100,000 and 2,816 job openings.


The number of available jobs for Data Scientists and Data Engineers has been increasing as shown in Figure 1. Although the number of available jobs dipped slightly in 2016, the trend appears to be rebounding for 2017. IBM estimates that the jobs for data engineers, data scientists and data developers will reach nearly 700,000 openings by 2020 (Columbus, 2017).

The Stitch (2016) report that was published in March of 2016 shows a rapid increase in Data Engineering jobs over the five years from 2010 to 2015 as shown in Figure 2. As mentioned previously, Stitch (2016) ranked the top technical skills for Data Engineers in 2016 using a LinkedIn (2018) survey of 6,500 people that identified themselves as Data Engineers in publicly visible personal and company profiles, skills, and professional experiences. LinkedIn is a popular business-oriented networking site, the emphasis of the site is to connect people that share work related interests. Stitch is a technology company that facilitates the building of data pipelines using various software products and conducts research on the topics of Data Science and Data Engineering. Stitch sent surveys to 6,500 participants and asked them to rank the top skills needed by Data Engineers. The survey results included 30,000 professional experiences and the respondents worked at 3,400 different companies throughout the world. The results were combined to a create a list of the top 20 skills needed by a Data Engineer. According to the Stitch Report, the top 5 skills needed by Data Engineers were SQL, Java, Python, Hadoop, and Linux.


Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.