Intelligent Data Analysis: Reasoning about Data

Article excerpt

The growing importance of the automatic or semiautomatic analysis of data sets in many real-world applications has led to the emergence of the field of intelligent data analysis (IDA), a combination of diverse disciplines including AI and statistics in particular. These fields complement each other: Many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge. Thus, we are seeing the development of intelligent systems for data analysis.

To provide an international forum for the discussion of these topics, a series of symposia on IDA was started in 1995 (Liu 1996). In 1997, the Second International Symposium on Intelligent Data Analysis (IDA97) was held at Birkbeck College, University of London, on 4 to 6 August. Almost 130 people from 20 countries in 4 continents attended. The final program consisted of 2 invited talks and 49 reviewed presentations chosen from 107 submitted papers. The symposium was organized as a single track of oral and poster presentations to give the participants the opportunity to discuss all the research, leading to many informal and fruitful interactions between presenters and participants. Each poster was introduced by its author in a brief talk during special plenary sessions.

Problems arising from effective analysis of large data sets have made the data analyst's job more challenging than ever. Although data analysts now have access to a variety of statistical and AI tools capable of performing different aspects of data analysis, they certainly need further support.

At the First International Symposium on Intelligent Data Analysis (IDA95), it was concluded that there is a need for research in the areas of mixed-initiative, IDA tools (Liu 1996). Although data analysts now have access to a variety of statistical algorithms, these tend to be unintelligent black boxes. Because data sets are too large today to be investigated manually, data analysis tools must themselves determine areas of interest, directions in which to guide the search, and try to relieve the user of the boring aspects of analysis. Another theme at IDA95 was the need to integrate different techniques, sometimes from diverse disciplines. Many of the techniques at the first symposium were "component technology," and little thought was given to how these methods could cooperate in an IDA architecture or framework. Also, most techniques were demonstrated on relatively small applications; there was little discussion of very large data sets. Thus, the theme of IDA97 was set: reasoning about data and how to analyze it, particularly large amounts of data, perhaps as humans analyze it, by exploiting many methods.

Major Themes of Presentation

Work reported at the symposium included a variety of research topics on the theory and application of various techniques to data analysis problems. The principal topics covered include exploratory data analysis, preprocessing, and tools; classification and feature selection; soft computing; knowledge discovery and data mining; estimation; clustering; and qualitative models. Two entire sessions were devoted to medical applications and data quality.

David Hand of The Open University, United Kingdom, started the symposium with an exciting survey of the issues and opportunities for IDA. His paper serves as an insightful assessment of a field that, although too young and exuberant to know exactly what it is about, clearly has great potential. In addition to promoting the interdisciplinary nature of IDA, Hand made a point of defining what he calls unintelligent data analysis, or data analysis that goes too far. To analyze data efficiently and intelligently, skills from a variety of disciplines are required, and it is from real problems that solutions emerge; building abstract methods will not be helpful in the future. Appropriately then, Larry Hunter of the National Library of Medicine presented a challenging new application for the IDA community. …