Magazine article Industrial Management


Magazine article Industrial Management


Article excerpt

The overhype cycle of data science

Entering my third decade as a technologist, I find myself giving the same advice today that I was giving in 1998: Technology cannot solve all of the worlds problems.

Technology is a powerful enabler that has revolutionized communications, food production, finance and education. But every few years, a new technology starts climbing the hype cycle by promising it will raise my children, fold my laundry, and read and send email. Data science is riding the magic claims of many vendors today.

We have seen these overhype cycles for decades.

Artificial intelligence has been through the cycle twice, each time leading to what members of the field called the AI winters. In the early 1970s, the field saw a pullback in research funding and commercial support after numerous setbacks for machine translation and voice recognition cast doubt on AI. About 20 years on, the field saw a second major retrenchment after expert systems failed to live up to expectations.

For several years, data science has ridden the overhype cycle, and the field is the latest incarnation of the overpromise of AI. Vendors sell many products and services based on data science, some of which may not make sense for a lot of organizations.

A lot of promises are made, but buyers need to keep a few things in mind.

First, data science cannot answer any question you have not already thought of. The computer still lacks the measure of energy necessary to create new solutions to original problems. Data science will only work when domain experts pose the questions for data scientists to answer. Together, data scientists and domain experts can work iteratively to answer a question posed together - if it exists in the data.

Second, data science requires the data necessary to answer a question. That means it needs examples of what you are looking for and ... hold it ... examples of what you are not looking for. For instance, knowing that Mario Lemieux scored 690 goals with the Pittsburg Penguins has little meaning until you know that nobody else scored as many. In individual cases, the data's variety provides the context for meaning.

Finally, a data science result will only reflect whatever biases exist within the underlying data. This is particularly important when using data that is coded by hand, such as police records.

But it is also true in machine-generated data. If I shop more at Amazon than others, I can have an outsized influence on its predictive systems. Methods exist to compensate for this, but knowing that the bias is there in the first place is critical to deployment.

Understanding these points is important for both technologists and technology managers. With this information, technologists will know what they can promise and what they can deliver. And technology managers will know what resources they need to deliver.

Data science overhype will continue, either way. We may see another AI winter because of it. But the field will continue.

Today, machine translation and voice recognition systems are in use everywhere. And expert systems underlie an entire subfield of data science. The ideas behind data science will continue on - perhaps with a bit less glamour - but data science can keep answering questions.

-James P. Howard II is a data scientist at the Johns Hopkins University Applied Physics Laboratory.

Better communication strategy provides clients clear benchmarks

Technical writers work with engineers, software developers and other subject matter experts to develop clear documentation and explain concepts to lay audiences. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.