Integrative High Dimensional Inference with Heterogeneity under Data Sharing Constraints-中泰证券金融研究院

科研学术

您当前的位置： 首页 > 科研学术 > 学术预告 > 学术报告 > 正文

Integrative High Dimensional Inference with Heterogeneity under Data Sharing Constraints

发布时间：2021-11-29 来源：点击数：

讲座题目：Integrative High Dimensional Inference with Heterogeneity under Data Sharing Constraints

主讲人：夏寅复旦大学

讲座时间：2021年12月4日（周六）上午09:30-10:30

讲座地点：腾讯会议ID：711-672-323

主讲人简介：

夏寅，复旦大学管理学院教授，博导，2013年博士毕业于宾夕法尼亚大学，2013-2016年在美国北卡大学教堂山分校任tenure track Assistant Prof。2016年入选国家级特聘教授；2020年获得国家自科基金优秀青年基金资助。研究方向包括高维统计推断、大范围检验及应用等。在JASA, AOS, JRSSB, Biometrika等期刊上发表二十余篇论文。

讲座摘要：

Evidence based decision making often relies on meta-analyzing multiple studies, which enables more precise estimation and investigation of generalizability. Integrative analysis of multiple heterogeneous studies is, however, highly challenging in the ultra high dimensional setting. The challenge is even more pronounced when the individual level data cannot be shared across studies, known as DataSHIELD contraint (Wolfson et al., 2010). Under sparse regression models that are assumed to be similar yet not identical across studies, we propose in this paper a novel integrative estimation procedure for data-Shielding High-dimensional Integrative Regression (SHIR). SHIR protects individual data through summary-statistics-based integrating procedure, accommodates between study heterogeneity in both the covariate distribution and model parameters, and attains consistent variable selection. Theoretically, SHIR is statistically more efficient than the existing distributed approaches that integrate debiased LASSO estimators from the local sites. Furthermore, the estimation error incurred by aggregating derived data is negligible compared to the statistical minimax rate and SHIR is shown to be asymptotically equivalent in estimation to the ideal estimator obtained by sharing all data. Next, we propose a novel data shielding integrative large-scale testing approach to signal detection by allowing between study heterogeneity and not requiring sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate and false discovery proportion. The new method is applied to a real example on detecting interaction effect of the genetic variants for statins and obesity on the risk for type II diabetes.

邀请人：严晓东

欢迎各位老师同学积极参加！

上一篇：临床试验的绝对和相对疗效

下一篇：Autoregressive Networks