The Research of Algorithm for Data Mining Based on Fuzzy Theory

Article excerpt

I. Introduction

With the advent of the information age, humankind is faced with more and more information processing problems such as data storage, organization and searches. These problems gradually increase in complexity in the hierarchy. Their scales of space activities are growing. It is faster in time scale and more extensive and far-reaching in consequences and implications. Data mining is a new field in data processing. It is a process that digs out various models, summary and export values from the known data sets. Data mining is an interdisciplinary that involves database technology, machine learning, statistics, neural networks, knowledge engineering, and high-performance calculation and so on. It has been widely applied in industry, agriculture, business, economics, health and many other industries.

Support Vector Machine (SVM) is one of the new methods using in data mining. It was proposed by Vapnik et al [1] [2] [3]. Using SVM, the problem of classification and regression can be deal with better. SVM has been a research focus in machine learning and was applied in many fields successfully. But there are many limitations in SVM. For example, when the training sets of SVM contain uncertain information, SVM will be incapable. In 2002, Chunfu Lin and Shengde Wang, the professors of Taiwan University put forward Fuzzy Support Vector Machine (FSVM) method with Shigeo Abe and Takuya Inoue, the professors of Kobe University. They made some improvements to SVM, but did not build FSVM on algorithm level. Their work lacks of the research on SVM that contains uncertain information.

To solve the problem of SVM containing uncertain information (fuzzy parameters) in common conditions, we discuss the constraining programming of fuzzy chance and the characteristic of fuzzy classification as well as its expression methods. The algorithm for classifying Support Vector Machine is also included in this paper.

2. Preliminary Knowledge

The research of this paper is carried on in the given possible space which is mentioned in [6].

2. 1 Fuzzy Chance-con~strained Programming Definition 2.1 Suppose [??] is a fuzzy subset inuniverse of discourse U. If U = R (set of real number) and [??] is a regular closed convex fuzzy set, [??] is called fuzzy number, written [??].

Definition 2.2 Suppose [??] is a fuzzy number. If [??]'s membership function:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[??] is called triangular fuzzy number, written [??] = ([r.sub.1], [r.sub.2], [r.sub.3]). Real number [r.sub.2] and [r.sub.1], [r.sub.3] are called triangular fuzzy number [??]'s center, left and right endpoints. Center is the main location of triangular fuzzy number. Real number a can be expressed as a special triangular fuzzy number a = (a, a, a).

Definition 2.3 Suppose [f.bar] : [R.bar] x R [right arrow] R is a binary function in real number field. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are fuzzy nurmbers. The membership function of fuzzy number [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] can be defined as:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Theorem 2.1 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are triangular fuzzy numbers. [rho] is a real number. Then:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; (1)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (2)

Proof: The conclusion can be directly deduced from Definition 2.3.

Theorem 2.2 Suppose [??] = ([r.sub.1], [r.sub.2], [r.sub.3]) is triangular fuzzy numbers. For any given confidence level [lambda] (0 < [lambda] [less than or equal to] 1), Pos [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Proof: If Pos {[??] [less than or equal to] 0} [greater than or equal to] [lambda], there must be an established between [r.sub.2] [less than or equal to] 0 and [r.sub.1]/[[r. …