Delivering Big Data

Article excerpt

How computer scientists convert information overload into valuable, even lifesaving knowledge.

When SupersXprm Sandy shammed ashore 5 miles south of Atlantic City, N.J., the night of October 29, where and when it hit came as no surprise. Thanks to supercomputers able to sift through ever higher mountains of data - collected froni planes that fly into storms, satellites, ground stations, and weather balloons - and advances in modeling technology, forecasters had pretty much/pinpointed 48 hours earlier where it would make landfall. Evan a four-day packing forecast "is as accurate as was the two-day forecast just 15 yearp ago," says Ed Rappaport, deputy director of the National Hurricane Center. "In other words, communities have gained two days of preparation time."

Forecasters' success in tracking Sandy, one of the worst storms ever to hit the East Coast, offers a high-profile example of thè ways scientists and researchers harness arid extract wisdom from a growing deluge of digital information, known bylthe buzzwords^Big Data. Measured in petabytes, terabytes, and zettabytes, Big Data is no passing fad. Researchers predict that a capability to use artificial intelligence^ecls to cope with, analyze, and combine hoards of disparate data sets, structAu*e4aiid_unst»tctured, will unleash countless breakthroughs in science, medicine, commerce, and national security, ultimately helping us to live healthier, safer, and more enjoyable lives. "There is simultaneous interest in Big Data from academia, government, and industry, and that bodes well," says Naren Ramakrishnan, a professor of engineering in Virginia Tech's computer science department. A report last year by the World Economic Forum called Big Data a new class of economic asset. In healthcare alone, says the consulting firm McKinsey & Co., effective use of Big Data could create more than $300 billion in value in a year, with two thirds ofthat coming from annual reductions in costs of around 8 percent.

Recognizing Big Data's potential, the Obama administration announced last March that it would spend $200 million on a research-and-development initiative - via such agencies as the National Science Foundation, the National Institutes of Health, and the Departments of Defense and Energy - to improve ways to access, store, visualize, and analyze massive, complicated data sets. For example, the Energy Department is spending $25 million to launch a new Scalable Data Management, Analysis and Visualization Institute at its Lawrence Berkeley National Laboratory. And beyond the White House initiative, the Pentagon is spending in additional $250 million on Big Data research.

The term Big Data is actually an understatement. The amount of global data should hit 2.7 zettabytes this year, then jump to 7.9 zettabytes by 2015. That's roughly equivalent to more than 700,000 Libraries of Congress, each with a print collection stored on 823 miles of shelves. A zettabyte is two denominations up from a petabyte. Big Data is also fairly new: IBM estimates that fully 90 percent of the data in the world today didn't exist before 2010. Where does it all come from? Well, a short list would include news feeds, tweets, Facebook posts, search engine terms, documents and records, blogs, images, medical instruments, RFID signals, and billions of networked sensors constituting the Internet of Things. "The amount of data is going sky high," says Mark Whitehorn, a professor of computing at Scotland's University of Dundee. "There's a reason why it was not collected back in the day."

Early Indicators

Cloud computing - networks of thousands of warehouse-size data centers - means we can now collect and store vast amounts of data at relatively low cost. And the processing power of today's ipercomputers means it can be searched and ''crunched - again, at little expense. That capability is powering EMBERS (for early model-based event recognition using surrogates), a multiuniversity, interdisciplinary team headed by VT's Ramakrishnan that uses "surrogates" to predict societal events before they happen. …