Academic journal article Library Philosophy and Practice

Bibliometric Survey on Incremental Clustering Algorithms

Academic journal article Library Philosophy and Practice

Bibliometric Survey on Incremental Clustering Algorithms

Article excerpt

1.INTRODUCTION

Data is continuously generated through various important applications viz. monitoring of financial transactions, intelligent energy network flow, satellite imagery, and information through web processing. Data mining techniques and results evolve with this kind of newly generated data. It necessitates an incremental update of new clusters on previous records rather than to re-cluster complete data from the beginning. The problem of clustering associated with growing data can be resolved using incremental clustering. This approach creates incremental learning to result in knowledge augmentation (Mulay & Kulkarni, 2013). The augmented knowledge can be useful to develop a novel strategy for clustering. The augmented knowledge can be derived using closeness among data points. Closeness can be determined using two ways viz. distance based and pattern based (Prachi M Joshi & Kulkarni, 2011). Distance-based closeness between patterns may not be suitable for all applications. It would lead to the curse of dimensionality. Pattern-based closeness factor compares the pattern of occurrences. It represents a thematic relationship and coherence between one or more objects. Closeness Factor (CF) may be activity specific or decision specific. The working principle of Algorithm based on Closeness Factor (CFBA) (Kulkarni & Mulay, 2013) is closer the data points; higher is the probability that they belong to the same chunk called a cluster. The CF value quantifies disparity in the form of data series. CF value equal to zero signifies that data series exactly match with each other even though volumes might be different (Kulkarni, Dwivedi, & Haribhakta, 2015). CF-based algorithm acts as an enhancer for knowledge augmentation process (Gaikwad, Joshi, &Mulay, 2016). CFBA creates a new cluster if new information does not match with the already formed clusters (Swamy& Kulkarni, 2006). Learning considers the behavioral pattern of data - all-time active learning required for getting the knowledge base evolved (Archana Chaudhari & Mulay, 2018) of clustering algorithms. CFBA modifies knowledge and maintains patterns for reuse. This modified knowledge and patterns saves clustering time and helps in decision-making. While the learning takes place, the cluster quality is also maintained. Thus, CFBA can play a vital role in a scenario where dynamic learning is manifested (Johnson & Singh, 2016). Table 1 shows advancements in single machine versions of CFBA.

The ECFAEF is the first version of CFBA. ECFAEF formulated probability, expected value, error and weight terminologies related to data series. Probabilistic CFBA compared with IK-Means and cobweb to gauge its performance. Probabilistic CFBA outperformed over these two incremental clustering algorithms. In MCFA Manhattan distance used for cluster formation. ICNBCFA does threshold computation using Naive Bayes. CBICA is probability free approach. CBICA replaced probability-based calculation in CFBA by Pearson's correlation coefficient. TBCA used threshold, cluster average versus outlier average, cluster deviation versus outlier deviation for identification of impactful attributes. All these versions of CFBA follow parameter-free clustering approach. The properties that make CFBA incremental are:

* Cluster-primary approach

* Iterative convergence

* Error based clustering

* Acquiring knowledge and augmentation

* Cluster-class assignment

* Ease of execution

CFBA, TBCA, and CBICA process numeric and mixed type of data sets. The raw input data formats that can be supplied as an input to these variants are:

* Time-series

* Boolean

* Spatial-temporal

* Alphanumeric

CFBA and its variants are order independent algorithms as patterns in obtained clusters are reordered to match with patterns of already stored clusters. Because of this property, these algorithms can effectively handle unstructured data sets also. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.