Academic journal article Geographical Analysis

Quantifying the Effects of Mask Metadata Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data

Academic journal article Geographical Analysis

Quantifying the Effects of Mask Metadata Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data

Article excerpt

The availability of individual-level health data presents opportunities for monitoring the distribution and spread of emergent, acute, and chronic conditions, as well as challenges with respect to maintaining the anonymity of persons with health conditions. Particularly when such data are mapped as point locations, concerns arise regarding the ease with which individual identities may be determined by linking geographic coordinates to digital street networks, then determining residential addresses and, finally, names of occupants at specific addresses. The utility of such data sets must therefore be balanced against the requirements of protecting the confidentiality of individuals whose identities might be revealed through the availability of precise and accurate locational data. Recent literature has pointed toward geographic-masking as a means for striking an appropriate balance between data utility and confidentiality. However, questions remain as to whether certain characteristics of the mask (mask metadata) should be disclosed to data users and whether two or more distinct masked versions of the data can be released without breaching confidentiality. In this article, we address these questions by quantifying the extent to which the disclosure of mask metadata and the release of multiple masked versions may affect confidentiality, with a view toward providing guidance to custodians of health data sets. The masks considered include perturbation, areal aggregation, and their combination. Confidentiality is measured by the areas of confidence regions for individuals' locations, which are derived under the probability models governing the masks, conditioned on the disclosed mask metadata.

Introduction

Access to data on the geographic distribution of health conditions is important to public health officials, academic researchers, and the general public. Public health officials need spatially linked health information in order to direct prevention and control activities to areas of need; researchers require access in order to conduct spatial analyses addressing important scientific and public policy questions, many of which may not have been envisioned when the data were originally collected; and ordinary citizens naturally want access to information that is relevant to their own individual health status (e.g., the locations of unusually high rates of cancer and other diseases). However, particularly since the 1996 enactment of Health Insurance Portability and Accountability Act regulations in the United States, the legal requirements of maintaining the confidentiality of health information have led to heightened concern about disclosure of individual-level health information and an increased interest in developing strategies that permit access. This is particularly so for health data that include the geographic coordinates of affected individuals because the ability of inverse address matching technology to reveal the street address of a domicile at a point location and the names of its residents makes disclosure of individual identities straightforward. Thus, it is widely recognized, within government agencies and other custodians of health data as well as within the research community, that methods are needed for providing access to data with sufficient detail to understand and evaluate the spatial distribution of health conditions, while at the same time sufficiently preserving the anonymity of individuals.

An increasingly important strategy for permitting access to sensitive data while protecting individual identities is to "mask" the data before releasing it to legitimate users. Masking includes but is not limited to the removal or encryption of obvious identifiers such as names, residential addresses, and Social Security numbers; it may also involve relatively more sophisticated statistical disclosure limitation procedures such as sampling, adding simulated data, grouping, and swapping (Duncan and Pearson 1991). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.