Meaning-Based Computing: Text Analysis Takes a Great Leap Forward

Article excerpt

[ILLUSTRATION OMITTED]

HP announced on Aug. 18, 2011, that it was buying Autonomy.--Ed.

Calling new technology game changing sounds like marketing hype. But every now and then, it's actually true. In the knowledge management area, I would cite meaning-based computing (MBC) as the new(er) technology that will shift how businesses operate. The formative events of MBC have occurred under the radar of the mainstream media, although they caught the attention of the computer press as far back as 2006, when IDC named MBC one of the top trends in content management, search, and access technologies in a report of the same name written by Susan Feldman, Melissa Webster, Abner Germanow, and Joel N. Martin.

An article by John Markoff, "Armies of Expensive Lawyers, Replaced by Cheaper Software," published in The New York Times on March 4, 2011, online and in the March 5, 2011, printed newspaper (www.nytimes.com/2011/03/05/science/ 05legal.html), surfaced the capabilities of this enterprise-level software to a more general readership.

MBC unites the power of modern search protocols with recent advances in text pattern recognition, language-context analysis, and even "sentiment analysis"--which sounds somewhat mysterious. The cumulative advances in MBC are enabling computers to make far more useful inferences about the meaning of communications, even as language usage evolves. Although MBC has been used over the past 8 years with considerable success, the synergy of recent advances has gained wider attention.

While MBC demonstrates new value for enterprise computing, should information professionals be interested in it? The short answer is yes. Whenever computers learn to mimic human skills in pattern recognition and to make inferences about the meaning of language, every reference provider should take note. The longer answer is more nuanced and intriguing, because disruptive technologies usually have an outsized impact on professional work. If MBC can in fact discover enhanced meaning in data sets and document repositories, it will also generate new opportunities for innovation. With that in mind, a review of MBC and its recent breakthroughs follows, along with its historical roots. Three strategies are offered in conclusion to challenge readers to get ready for the new frontier that MBC opens up.

THE 'CONTEXT' OF CONTENT

MBC is the brainchild of University of Cambridge's Michael Lynch, who founded Autonomy, a global consultancy that uses MBC to increase productivity, to mitigate the risk of lost data, and to improve strategic planning. In studying search technologies, Lynch came to realize that data warehousing could provide access to vast repositories of both structured and unstructured data, a very important distinction. He concluded that prevailing search techniques were not keeping pace with the migration of data from structured formats (such as databases) to unstructured formats (email, telephone conference calls, documents on disorganized directory trees, and so on). He estimated that unstructured data now account for as much as 85% of the total data we use--and that much of this content cannot be recovered by standard techniques.

Lynch saw a need for search protocols that could extract meaning from data in both structured and unstructured formats, uncovering linkages between diverse documents, associating and analyzing the meaning of words in various contexts, and ultimately discovering the intentions of the writers. "Big data" made serious experimentation possible, as business firms and government agencies started managing vast amounts of information. These mega-troves presented computer scientists with ideal test beds for discovering meaning through automated analysis.

Although MBC is a new application, the theory that inspired it dates from the mid-1740s. Lynch drew inspiration from Bayes' theorem, a mathematical concept that explains the probability of things happening, including the concept of "inverse probability. …