Introduction Primary and Secondary Purposes The Spectrum of Risk for Data Access Managing Risk What Is De-identification? Learning Something New The Status Quo Safe Harbor-Compliant Data Can Have a High Risk of Re-identification The Adversary Knows Who Is in the Data The Data Set Is Not a Random Sample from the U.S. Population Other Fields Can Be Used for Re-identification Moving Forward beyond Safe Harbor Why We Wrote This Book References THE CASE FOR DE-IDENTIFYING PERSONAL HEALTH INFORMATION Permitted Disclosures, Consent, and De-identification of PHI Common Data Flows The Need for De-identification Permitted Uses and Disclosures of Health Information Uses of Health Information by an Agent Disclosing Identifiable Data When Permitted References The Impact of Consent Differences between Consenters and Non-Consenters in Clinical Trials The Impact of Consent on Observational Studies Impact on Recruitment Impact on Bias Impact on Cost Impact on Time References Data Breach Notifications Benefits and Costs of Breach Notification Cost of Data Breach Notifications to Custodian Data Breach Trends The Value of Health Data Financial Information in the Health Records Financial Value of Health Records Medical Identity Theft Monetizing Health Records through Extortion References Peeping and Snooping Examples of Peeping Information and Privacy Commissioners Orders Ontario HO-002 HO-010 HR06-53 HI-050013-1 Alberta Investigation Report H2011-IR-004 IPC Investigation (Report Not Available) Saskatchewan H-2010-001 References Unplanned but Legitimate Uses and Disclosures Unplanned Uses by Governments Data Sharing for Research Purposes Open Government Open Data for Research Unplanned Uses and Disclosures by Commercial Players Competitions References Public Perception and Privacy Protective Behaviors References Alternative Methods for Data Access Remote Access On-Site Access Remote Execution Remote Queries Secure Computation Summary References UNDERSTANDING DISCLOSURE RISKS Scope, Terminology, and Definitions Perspective on De-identification Original Data and DFs Unit of Analysis Types of Data Relational Data Transactional Data Sequential Data Trajectory Data Graph Data The Notion of an Adversary Types of Variables Directly Identifying Variables Indirectly Identifying Variables (Quasi-identifiers) Sensitive Variables Other Variables Equivalence Classes Aggregate Tables References Frequently Asked Questions about De-identification Can We Have Zero Risk? Will All DFs Be Re-identified in the Future? Is a Data Set Identifiable If a Person Can Find His or Her Record? Can De-identified Data Be Linked to Other Data Sets? Doesn¿t Differential Privacy Already Provide the Answer? A Methodology for Managing Re-identification Risk Re-identification Risk versus Re-identification Probability Re-identification Risk for Public Files Managing Re-identification Risk References Definitions of Identifiability Definitions Common Framework for Assessing Identifiability References Data Masking Methods Suppression Randomization Irreversible Coding Reversible Coding Reversible Coding, HIPAA, and the Common Rule Other Techniques That Do Not Work Well Constraining Names Adding Noise Character Scrambling Character Masking Truncation Encoding Summary References Theoretical Re-identification Attacks Background Knowledge of the Adversary Re-identification Attacks Example of a Linking Attack on Relational Data Example of a Linking Attack on Transaction Data Example of a Linking Attack on Sequential Data Example of a Linking Attack on Trajectory Data Example of a Linking Attack Based on Semantic Information References MEASURING RE-IDENTIFICATION RISK Measuring the Probability of Re-identification Simple and Derived Metrics Simple Risk Metrics: Prosecutor and Journalist Risk Measuring Prosecutor Risk Measuring Journalist Risk Applying the Derived Metrics and Decision Rules Relationship among Metrics References Measures of Uniqueness Uniqueness under Prosecutor Risk Uniqueness under Journalist Risk Summary References Modeling the Threat Characterizing the Adversaries Attempting a Re-identification Attack Plausible Adversaries An Internal Adversary An External Adversary What Are the Quasi-identifiers? Sources of Data Correlated and Inferred Variables References Choosing Metric Thresholds Choosing the ¿ Threshold Choosing the ¿ and ¿ Thresholds Choosing the Threshold for Marketer Risk Choosing among Thresholds Thresholds and Incorrect Re-identification References PRACTICAL METHODS FOR DE-IDENTIFICATION De-identification Methods Generalization Principles Optimal Lattice Anonymization (OLA) Tagging Records to Suppress Suppression Methods Overview Fast Local Cell Suppression Available Tools Case Study: De-identification of the BORN Registry General Parameters Attack T1 Attack T2 Attack T3 Summary of Risk Assessment and De-identification References Practical Tips Disclosed Files Should Be Samples Disclosing Multiple Samples Creating Cohorts Cohort Defined on Quasi-identifiers Only Cohort Defined on a Non-Quasi-identifier Cohort Defined on Non-Quasi-identifiers and Quasi-identifiers Impact of Data Quality Publicizing Re-identification Risk Assessment Adversary Power Levels of Adversary Background Knowledge De-identification in the Context of a Data Warehouse References END MATTER An Analysis of Historical Breach Notification Trends Methods Definitions Breach Lists Original Data Sources Sponsors of Lists Data Quality Estimating the Number of Disclosed Breaches Data Collection Interrater Agreement Results Discussion Summary of Main Results Post Hoc Analysis References Methods of Attack for Maximum Journalist Risk Method of Attack 1 Method of Attack 2 Method of Attack 3 How Many Friends Do We Have? References Cell Size Precedents References The Invasion of Privacy Construct 6B Dimensions Sensitivity of the Data Potential Injury to Consumers Appropriateness of Consent General Information on Mitigating Controls Introduction Origins of the MCI Subject of Assessment: Data Requestor versus Data Recipient Applicability of the MCI Structure of the MCI Scoring Which Practices to Rate Third-Party versus Self-Assessment Scoring the MCI Interpreting to the MCI Questions General Justifications for Time Intervals Practical Requirements Remediation Controlling Access, Disclosure, Retention, and Disposition of Personal Data Safeguarding Personal Data Ensuring Accountability and Transparency in the Management of Personal Data Assessing Motives and Capacity Dimensions Motives to Re-identify the Data Capacity to Re-identify the Data.
Guide to the de-Identification of Personal Health Information