Text Analytics, Legal Research, and Semantic Search Top the News

Article excerpt

In my wrap-up at the end of 2007, I mentioned a number of trends I was watching for 2008. Among these was the increasing interest in text analytics, data extraction and mining, semantic search, and related technologies. So far, this has proven to be a hot area of activity and growth.

SAS Institute, Inc., a business intelligence (BI) and advanced analytics provider, acquired privately held Teragram Corp., a provider of natural-language processing (NLP) and advanced linguistic technology. The company says the acquisition will enhance SAS' own text mining and analytical BI offerings, as well as extend them to enterprise and mobile search. Teragram's NLP technology is well-established with a customer base that includes CNN, Forbes.com, NYTimes Digital, Sony, WashingtonPost.com, Wolters Kluwer, the World Bank, and Yahoo!.

Search expert Stephen Arnold says on his Beyond Search blog: "More buy outs are looming. With the deepening financial morass in the U.S., I also believe some of the weaker search and content processing firms are going to turn off their lights. The cost of developing, innovating, and maintaining text processing technology is far greater than most people know."

The folks at CMS Watch have characterized the recent acquisitions in the enterprise search space as a "game of musical chairs." One analyst posted this on the CMS site: "While it is tempting to see this as the confirmation of a trend of convergence in BI and enterprise search, that would be downplaying the ubiquitousness of the underlying components." Many of the key components in enterprise search products, such as text filtering, are licensed from specialists--and the moves and alliances can get complicated. For example, "SAS used to work with Inxight [Software, Inc.], Teragram's main competitor, while Teragram's technology has showed up in products from Fast Search & Transfer [ASA] and Verity...." So grab a chair while you can.

[ILLUSTRATION OMITTED]

Hakia.com, a meaning-based web search engine, announced that it has licensed hakia OntoSem (its Ontological Semantic technology) to River-Glass, Inc., a provider of real-time analytics and intelligent web-information collection and analysis solutions. RiverGlass will integrate hakia OntoSem into its analysis software.

[ILLUSTRATION OMITTED]

Keep an eye on other companies in this space, including Clarabridge; Nstein Technologies; TEMIS S.A.; SPSS, Inc.; Autonomy Corp.; and Endeca Technologies, Inc., among others. Here's an interesting project that was announced recently: Collexis Holdings, Inc. is working with Thomson Scientific to join together Collexis' Knowledge Dashboard with Thomson Scientific's Web of Science to create a custom data mining solution for the research community.

Northern Light Group, LLC's MI Analyst goes beyond the extraction of nouns (people, places, and things). It uses advanced text analytics technologies to provide entity extraction for key business facets, relationship identification between entities, sentiment scoring, meaning extraction, and trend analysis. The company calls this "meaning extraction." It recently incorporated MI Analyst capabilities, formerly available just to enterprise customers, into its new free business search engine (www.nlsearch.com).

New Legal Research Tools

In my December 2007 column, I mentioned the launch of new legal research sites, including AltLaw.org and Public.Resource.org. These are expanding, and other new sites have emerged that put pressure on the commercial vendors to provide more value-added services. In February 2008, Public.Resource.org and Creative Commons published 1.8 million pages of federal case law online, including all U.S. Supreme Court cases and all Courts of Appeals decisions dating from 1950. It is available for developers with no restrictions on reuse at http://bulk.resource .org/courts.gov. The AltLaw.org site has added these decisions. …