Academic journal article Journal of Law and Health

The Ethics in Synthetics: Statistics in the Service of Ethics and Law in Health-Related Research in Big Data from Multiple Sources

Academic journal article Journal of Law and Health

The Ethics in Synthetics: Statistics in the Service of Ethics and Law in Health-Related Research in Big Data from Multiple Sources

Article excerpt

I. INTRODUCTION                                                 88 II. A DELICATE EQUILIBRIUM                                      89 III. DIFFERENT SCOPES OF PROTECTIVE REGULATIONS                 93 IV. AN AUTHORIZATION TO USE INFORMATION FOR HEALTH-RELATED RESEARCH                                                        97 V. POSSIBLE EXEMPTIONS FROM THE AUTHORIZATION REQUIREMENT      103 VI. SYNTHETIC DATA AS A MEANS TO FULFILL ETHICAL REQUIREMENTS  109 VII. A NEW RISK-BENEFIT BALANCE                                112 VIII. CONCLUSION                                               116 


An ethical advancement of scientific knowledge demands a delicate equilibrium between benefits and harms, in particular in health-related research. When applying and advancing scientific knowledge or technologies, Article 4 of UNESCO's Universal Declaration on Bioethics and Human Rights, ethically justifiable research requires maximizing direct and indirect benefits, and minimizing possible harms. (1) The National Institution of Health [NIH] Data Sharing Policy and Implementation Guidance similarly states that data necessary for drawing valid conclusions and advancing medical research, should be made as widely and freely available as possible (in order to share the benefits), while safeguarding the privacy of participants from potentially harmful disclosure of sensitive information. (2) This paper discusses the challenges in the maximization of research benefit and the minimization of potential harms in the unique context of health-related research in Big Data from multiple sources, which are differently protected by the law.

Part I frames the ethical dilemma by discussing potential benefits and harms, showing the constant misalignment in health-related research in Big Data from multiple sources, between the benefits in the use of confidential information for scientific purposes, and the value in keeping confidentiality. In part II, the paper addresses existing regulations, their nature and legal coverage. It highlights the challenges prevailing when combining data from multiple sources that are differently protected by the law. Part III compares different requirements for consent or authorization to use persons' health information for research. It focuses on the difficulty of existing regulation to ensure those requirements when using multiple sources of data. Part IV investigates whether exemptions from the authorization requirement could prevail in the context of information that exceeds the protection of the HIPAA and the Protection of Human Subjects Regulations. In part V the paper proposes a solution is of a statistical nature, using the method of synthetic data to balance conflicting consideration. Part VI shows how the use of synthetic data can overcome some of the ethical challenges.


The term "Big Data" is differently defined by users and policy makers. What it means is dramatically different to the media, business, health, or academic statistics communities, and to different regulatory bodies. (3) To our knowledge, there is no gold standard definition. Big Data is considered data on a massive scale in terms of volume, intensity, and complexity that exceed the ability of standard software tools to manage and analyze. (4) But also, "It is less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data sets." (5) Laney coined the definition in the Big Data analytics world: volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources)--the 3V definition. (6)

As life is being recorded and quantified in ways hard to imagine a decade ago, there is great promise in Big Data research, in particular for health purposes. The literature often addresses medical records as the source for health-related research in Big Data, (7) for example, electronic records document multiple aspects of medical care: quantitative and qualitative data of patients, imaging records, providers' documentation of health care delivery (medication and other services), narratives and genetic information, all of which provide important information on a person's physical condition. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.