Academic journal article Journal of Digital Information Management

Mining Frequent Patterns through Microaggregation in Differential Privacy

Academic journal article Journal of Digital Information Management

Mining Frequent Patterns through Microaggregation in Differential Privacy

Article excerpt

I. Introduction

Frequent pattern mining has been well studied within the data mining context, and numerous frequent pattern mining algorithms have been developed. Among them, the Apriori algorithm [1] and the FP-Growth algorithm [2] are the two most widely used. However, these algorithms are unable to preserve the privacy of the dataset if sensitive information (e.g., patient health records) is contained therein. Bhaskar et al. [3] proposed the Trun-Fre approach based on a Laplace mechanism to achieve differential privacy. However, this does not perform well when candidate sets become overly large, as the length of the frequent pattern increases. To decrease the magnitude of a candidate set, Li et al. [4] developed the PrivBasis approach, under which:

(1) The pattern basis is generated based on maximal clique; and (2) top-k patterns are mined to achieve differential privacy. However, the released dataset to be mined often lacks utility. For example, the support of patterns {a} and {b} with the original support <3, 3> could become <2.5, 3.2> after adding Laplace noise. As we can see, the consistency constraint has been broken. The DP-topkP algorithm in [5] performs well in terms of data utility, while in this paper noise is reduced through combining the k-anonymity model with the differential privacy model.

Normally, there are two metrics when evaluating an algorithm in differential privacy: privacy and utility. Therefore, a synergy must be achieved. In [6], an approach to enhance data utility in differential privacy via microaggregation-based k-anonymity is proposed.

In this paper, microaggregated data is released by clustering the selected patterns based on the weight of each pattern to achieve k-anonymity, and top-k frequent patterns are selected based on I the microaggregated data through the exponential mechanism. Lastly, the true support of each frequent pattern is perturbed by adding Laplace noise. By adjusting the cluster size, a synergy between privacy and data utility can be achieved.

2. Background And Related Work

2.1 Mining Frequent Patterns

Frequent pattern mining is widely used in the area of knowledge discovery and can serve various purposes, among which mining association rules [7] and predicting user behavior [8] have been greatly studied.

Formally speaking, let t be a transaction record, and D = {[t.sub.1], [t.sub.2],...,[t.sub.N]} be the transaction dataset that contains a few items. A set of items tending to appear together as a pattern shall be denoted asp In this paper, we apply the FP-Growth algorithm to the original dataset D to create a set of frequent patterns with support larger than the minimum support, denoted as [P.sub.set]. For each pattern [p.sub.i] [member of] Pset, assign a weight value w. based on its frequency. The universe of denoted as p stands for the weight value set of all the frequent patterns mined through the FP-Growth algorithm.

2.2 Differential Privacy

The differential privacy model for query operation in an interactive setting was originally proposed in [9]. The differential privacy model provides significantly robust and provable privacy preservation.

Definition 1: [epsilon]--differential privacy

As discussed in [1,10], for two given neighboring datasets D and D', a randomized mechanism A gives for any s e range (A):

Pr [A (D) = s] [less than or equal to] [e.sup.[epsilon]]. Pr [A (D') = S].

In this paper, the two neighboring datasets are or, where denotes a frequent pattern. Specifically, preserving differential privacy in a frequent pattern mining setting refers to the change in the probability distribution of the [e.sup.[epsilon]]. multiplicative--bounded by selecting a given frequent pattern or not.

2.3 Achieving k-anonymity based on Microaggregation

As shown by many researchers, the k-anonymity [11] model and its extensions have a significant influence on the research of privacy preservation. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.