Academic journal article Journal of Theoretical and Applied Electronic Commerce Research

Editorial: What Can We Expect from Data Scientists?

Academic journal article Journal of Theoretical and Applied Electronic Commerce Research

Editorial: What Can We Expect from Data Scientists?

Article excerpt

Data Scientist - The New Profession

Data scientist is probably the most trendy job in Information Technology (IT) nowadays. This new profession emerged with the Big Data wave. Even though there is no such thing like an exact job profile, we expect that the data scientist can handle all the Big Data challenges that are novel to us. Without being a magician she or he shall help to deliver us all the magic Big Data promises. The data scientist capitalizes on unstructured data without taking the roles of a programmer, database expert, statistician, or content manager. All these professions are around for decades. So, why invent a new one?

"More than anything, what data scientists do is make discoveries while swimming in data [...] they are able to bring structure to large quantities of formless data and make analysis possible." [3] They sketch, orchestrate, and control the discovery process. The leading paradigm in this process it to find information that meets a certain need or provides an answer for a certain problem. "We need to avoid the temptation of following a data-driven approach instead of a problem-driven one." [5] The data scientist has to develop an idea about the required information that meets our information need. We can expect that data scientists have a deep understanding of the foundations of both, the nature of information and the domain. They follow a mental model on the information demand that abstractly reflects the facts they expect to encounter in data and the way to detect and present them to us. We expect the data scientist to discover novel information that may provide us with new insights. And we want these insights to be true. It is thus part of the data scientist's responsibility to make sure that the discovered information is not only novel but also trustworthy. The data scientists cannot prove data analysis models. That exceeds their capabilities. We cannot hold them liable for information that eventually turns out to be wrong. Nevertheless their skills should include a sound sensation of plausibility that helps them to raise doubts and to prompt a closer look when the results of data analysis seem too questionable to them. However, separating questionable results from plausible ones is a task that is far from being trivial.

Separating Data Science from Data Fiction

When driving a car we often encounter these nice roadside signs that sometimes make us realize that we're driving too fast. Furthermore, on some occasions there is official personnel not far from these signs measuring our speed, explaining to us our traffic infraction face-to-face and documenting it on a speeding ticket. Doesn't this sound a bit outdated? There are enough sensors in a car that measure location, speed, time, and more. Even if the car's sensors don't measure those, the driver's cellphone can do it. Combining this sensor information with the cartographic data about roads and speed limits we can easily imagine that, by the end of day, the car or the phone exactly knows every infraction we committed and can automatically trigger the issuing of speeding tickets. Aren't there just legal aspects that hamper this technically feasible scenario making its way into reality?

One trait of Big Data is the availability of sensor data and their combination with already available data generating new information, like the detection of traffic infraction. We can extrapolate this scenario in many directions, including more types of infraction, driving for an irresponsibly long time without a substantial break, or detecting patterns of aggressive driving. We can extend it also towards future possibilities if we think about sensor information from smart watches indicating a possibly problematic pulse rate or from the car's air control sensor that the driver may drive under the influence of alcohol.

Much like in Data Mining, the strength of Big Data originates from the combination of facts producing new information. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.