Unsupervised Learning Aided by Clustering and Local-Global Hierarchical Analysis in Knowledge Exploration

Article excerpt

ABSTRACT: Unsupervised learning plays an important role in the knowledge exploration discovery. The basic task of unsupervised learning is to find latent variablesor relationships in a given dataset without any assumed regularities or patterns. In this paper we apply two advanced models, clustering analysis and hierarchial analysis to accomplish unsupervised learning. K-Means clustering presents its strength in large scale clustering. The original data can be preprocessed and the potential variables are targeted. Correlations among these variables are explored in the subsequent sets by Local Global Hierarchial Analysis (LGHA) assisted by three main steps. In the first step, we use a structural approach to find qualititative patterns from the given variables. Then, the second step applies a quantitative based algorithm to find quantitative patterns from those variables. The and last step generated global hybrid patterns by combining the local patterns obtained from the first two steps based on a certain criterion. Both of the K-Means and Local Global Hierarchial Analysis (LGHA) models are applied in experiments with real world longitutional medical datasets.

Keywords: Hierarchical Analysis, K-Means Clustering, Unsupervised Learning, Knowledge Exploration

Categories and Subject Descriptors I 2.4 [Knowledge representation formalism and methods]; I.2.6 [Learning]: I.5.3 [Clustering]

General Terms

Knowledge representation, Learning classification

1. Introduction

From a traditional point of view, knowledge exploration can be categorized into supervised learning and unsupervised learning (Jordan and Jacobs 1994). In the last decade, there have been research activities on supervised learning approaches and techniques, whereby class information is available before any knowledge exploration takes place. The most utilized approach is to achieve a predetermined independent measurement in order to preferentially target classes. Then a classification algorithm is applied in the data pre-processing stage (Liu and Motoda 1998, Liu and Yu 2005). However, this approach is not robust to be effectively applied on features with irregular sizes or nonrecurring, high-dimensional variables.

Unsupervised learning is a recent approach in knowledge exploration. It is widely used on/with unlabeled data, such as extracting relevance that exists in records. Unsupervised learning is an important supplementary method to category data since it could increase the precision of clustering results. Unlike supervised learning, unsupervised learning attempts to find the most reasonable patterns by uncovering relationships best instead of using preferential classification labels (Dy and Brodley 2000, 2004). Because the idea behind unsupervised learning is to run an unsupervised algorithm on raw data (Kohavi and John 1997), most researchers consider the applications of data clustering and data reduction (including dimension reduction, size reduction, etc.) as two key issues in the framework of knowledge exploration. The use of an unsupervised learning method could save time in data processing by removing the matching and ranking process used for specified classes, and avoiding redundant analysis.

In this paper, we propose to combine two models to achieve unsupervised learning. K-Means Clustering Analysis (KMeans) is used to partition the original combine two models to achieve unsupervised learning. K-Means Clustering Analysis (K-Means) is used to partition the original data according to a certain criterion. As a robust model, K-Means semiautomatically generates clusters and assigns data into different clusters. The data within these clusters will be labelled prior to when we collect observational sets.

Local-Global Hierarchical Analysis (LGHA) attempts to discover accurate and relevant correlations from observational data (Lin and Orgun 2000, Lin and Orgun 2004, Lin et al 2000, Zhang et al 2006). …