Academic journal article Journal of Digital Information Management

A Novel Method of Data Stream Classification Based on Incremental Storage Tree

Academic journal article Journal of Digital Information Management

A Novel Method of Data Stream Classification Based on Incremental Storage Tree

Article excerpt

1. Introduction

Nowadays, tremendous amounts of data streams are generated in many applications such as e-commerce, wireless sensor network, network monitor, telecommunication system, stock market and meteorological data. The continuous arrival of data streams in multiple, rapid, time-varying, and potentially unbounded streams raises new challenges and research problems. Indeed, it is usually not feasible to simply store the arriving data in a traditional database management system in order to perform operations on that data later on. Rather, data stream must generally be processed in an online, incremental manner so as to guarantee that results are up-to-date and that queries can be answered with small time delay. One important data mining problem which has been studied in the context of data streams is that of classification [1]. The main thrust on data stream mining in the context of classification has been that of one-pass mining [2,3].In general, the use of one-pass mining does not recognize the changes which have occurred in the model since the beginning of the stream construction process. While the work in works on time changing data streams, the focus is on providing effective methods for incremental updating of the classification model.

Existing data stream classification approaches [4,5,6,7,8,9,10,11] are largely classified into distance, decision theory, and structure-based approaches. The distance-based approach [10,11] uses a distance metric to choose a class that is closest to the query. Well-known distance metrics such as Euclidean distance and Dynamic Time Warping (DTW) are commonly used. The decision theory-based approach [4, 5, 9] generally uses a statistical or machine learning classifier, such as Bayesian classifiers, Hidden Markov Models (HMMs), Recurrent Neural Networks (RNNs), genetic algorithms, micro-clusters, etc. These methods select the maximum likelihood class using prior probabilities and distribution statistics. Finally, the structure-based approach [6, 7, 8] builds classification rules in the form of a tree or graph structure, and then chooses the class with the most similar structure.

This paper proposes a Bayesian classification data mining algorithm based on incremental storage tree. Use sliding window to process data stream and divide it into several basic unit, apply Principal component analysis (PCA) to compress the data from window and produce dynamic incremental storage tree, use the second power strategy to continuously update several incremental storage tree. Then use multi-classifier integration technology combines with Bayesian classification to produce Bayesian classifier. At last, adjust the weight of classifier by cyclic test, and get high classification accuracy. Detailed simulation analysis demonstrates that the presented algorithm is of high efficiency of space and time and is more stable.

A review of background is given in section 2. In section 3, we introduce the structure and update method of incremental storage tree. In section 4, we introduce the structure of Bayesian classifier based on incremental storage tree. The experimental results of comparing the algorithm proposed in this paper with other algorithms are also presented in section 5. Finally, our work of this paper is summarized in the last section.

2. Background

2.1 Related work

At present, more algorithms have been used in the data stream classification. Zhang and Zhou [12] proposed a novel frame work that first categorized diverse training examples into four types and assigned learning priorities to them. Then, they derived four learning cases based on the proportion and priority of the different types of training examples. Shaker et al [13] proposed a novel model class called fuzzy pattern trees (FPTs) for machine learning. They considered the problem of learning fuzzy pattern trees for binary classification from data streams. Apart from its practical relevance, this problem is also interesting from a methodological point of view. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.