Academic journal article Harvard Journal of Law & Technology

Get to Know Me: Protecting Privacy and Autonomy under Big Data's Penetrating Gaze

Academic journal article Harvard Journal of Law & Technology

Get to Know Me: Protecting Privacy and Autonomy under Big Data's Penetrating Gaze

Article excerpt

TABLE OF CONTENTS I. INTRODUCTION II. CURRENT CONCEPTIONS OF PERSONAL INFORMATION   A. Privacy Theories   B. Privacy Laws   C. Privacy Policies III. CHARACTERISTICS OF BIG DATA   A. Data Collection Is Constant and Imperceptible   B. New Insights Are Generated   C. Inferred Information Is Often Sensitive   D. Discovered Correlations Are Unexpected IV. PRIVACY AND AUTONOMY HARMS FROM BIG DATA   A. Use Harms   B. Non-Use Harms      1. Learning Private Information      2. Limiting Autonomy      3. Impeding Anonymity      4. Eroding Belief in Human Agency V. ASSESSING ALGORITHMS AND HARMS VI. CONCLUSION 


Big data, the storage and analysis of large datasets, now affects everyday life. (1) It personalizes ads, calculates criminal sentences, and predicts criminal activity or, recast in a different light, constructs filter bubbles, (2) violates rights of procedural due process, and enables police departments to target communities on a discriminatory basis. (3) Both the benefits and dangers of the applications of big data have been widely discussed in popular discourse and legal literature. (4) But before big data can be used by companies and governments to provide services or make decisions, it must first derive inferences about the people within datasets. It compiles, analyzes, evaluates, and predicts a person's actions and attributes, all before the conclusions are used for a business or state purpose.

Current privacy discussions are predominantly concerned with how inferred information is used. (5) This Note, however, proposes that the process of analyzing data to infer information about people also threatens their privacy and autonomy interests. This Note proceeds in four parts: Part II summarizes current academic, legal, and industry conceptions of informational privacy and argues they have failed to consider the harm potentially posed by big data's capability of inferring new personal information; Part III considers the novel and unique characteristics of big data collection and analytics; Part IV discusses how big data threatens privacy and autonomy interests by making inferential conclusions about people's attributes and conduct, even if the conclusions are never used; and Part V proposes a framework to differentiate between data analysis that is innocuous and harmful. The framework states that a data-mining algorithm violates privacy and autonomy interests if: (1) it relies on an unexpected correlation between data points, (2) it infers personal information of a particularly sensitive nature, and (3) generating the inference breaches contextual integrity.


Privacy has traditionally been difficult to define and regulate. Despite disagreement over how to best treat the issue, privacy theories, privacy law, and privacy policies share a characteristic in common: conceptualizing personal information as static pieces of knowledge about someone. Part II makes this observation by examining theories of privacy, privacy laws, and privacy policies.

A. Privacy Theories

A fundamental theory of privacy defines privacy as the control over personal information. In his seminal book on privacy, privacy scholar Alan Westin articulates the control theory as "the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others." (6) Legal scholar Arthur Miller writes that privacy is "the individual's ability to control the circulation of information relating to him." (7) In other words, the privacy-as-control perspective concludes that a person maintains privacy when she can decide how her information is collected, shared, used, retained, or otherwise manipulated.

Before big data, maintaining control over the data one shared with others necessarily meant controlling one's personal information. If a viewer voluntarily gave Netflix her ratings of certain movies and decided how Netflix could share, use, and retain the ratings, she maintained control over the information. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.