讲座题目:Balancing Inferential Integrity and Disclosure Risk via Model Targeted Masking and Multiple Imputation
主 讲 人:姜蓓 阿尔伯特大学 加拿大
讲座时间:2021年12月11日(周六)09:30
讲座地点:Zoom会议号:697 064 8295 密码:123456
主讲人简介:
姜蓓,加拿大阿尔伯特大学数学与统计科学系副教授,博导。研究兴趣包括隐私数据分析、贝叶斯分层建模、多视图数据集成的联合建模等。研究成果广泛应用在妇女健康、心理健康、神经学、生态学等领域。在JASA, JRSSC, NeurIPS等期刊及会议上发表三十余篇论文。
讲座摘要:
There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals' identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values (Rubin, 1993). However, information loss or incorrectly specified imputation models can weaken or invalidate the inferences obtained from the MI-datasets. We propose a new masking framework with a data-augmentation (DA) component and a tuning mechanism that balances protecting identity disclosure against preserving data utility. Applying it to a restricted-use Canadian Scleroderma Research Group (CSRG) dataset, we found that this DA-MI strategy achieved a 0% identity disclosure risk and preserved all inferential conclusions. It yielded 95% confidence intervals (CIs) that had overlaps of 98.5% (95.5%) on average with the CIs constructed using the full, unmasked CSRG dataset in a work-disability (interstitial lung disease) study. The CI-overlaps were lower for several other methods considered, ranging from 73.9% to 91.9% on average with the lowest value being 28.1%; such low CI-overlaps further led to some incorrect inferential conclusions. These findings indicate that the DA-MI masking framework facilitates sharing of useful research data while protecting participants' identities.
邀请人: 严晓东
欢迎各位老师同学积极参加!