Efficacy evaluation of hypertensive drugs based on targeted maximum likelihood estimation
-
摘要:
目的 依托山东省胶南市“全人群高血压、糖尿病综合防治项目”建立队列,借助靶向最大似然估计(targeted maximum likelihood estimation, TMLE)模型评价高血压患者服用卡托普利或尼群地平对血压控制的平均因果效应和个体化因果效应,在大数据背景下辅助精准医疗以实现高血压控制。 方法 筛选只服用卡托普利或尼群地平的患者,将其第一次随访血压控制情况作为结局,将年龄、性别、职业、BMI、吸烟、饮酒及运动情况纳入分析,采用嵌入Super Learner组合预测算法的靶向最大似然估计模型拟合条件均值结果的初始估计并进行波动,更新初始拟合,对目标参数做出最优偏差-方差权衡优化模型,从而得到平均因果效应,并进一步分析个体化因果效应。 结果 共纳入13 676名高血压患者。总体上相比服用卡托普利,服用尼群地平更有利于血压控制(OR=1.24, 95% CI: 1.13~1.35, P=0.004)。从个体净效应来看,98.65%的患者使用尼群地平的血压控制效果更好。 结论 靶向最大似然估计模型能够分析平均因果效应和个性化因果效应,为现实世界的因果推断研究提供方法借鉴。 Abstract:Objective To facilitate precision medicine design and hypertension disease control by the usage of big data, the targeted maximum likelihood estimation (TMLE) model was implemented to evaluate the average treatment effect and individualized treatment effect of captopril or nitrendipine on hypertension control based on the project named "Comprehensive Prevention and Control Project of Hypertension and Diabetes for All Populations" in Jiaonan, Shandong Province. Methods We first selected hypertension patients taking captopril or nitrendipine in the cohort as a starting point. The outcomes of these patients were defined as whether their blood pressure was controlled at the first follow-up. Age, gender, occupation, BMI, smoke, drink and exercise were then included as confounders. After that, we applied targeted maximum likelihood estimation inset with Super Learner combination prediction algorithm to fluctuate the initial estimate of the conditional expectation of the outcome. Based on the initial estimate, the optimization model was built until the best balance of deviation and variance was reached in the model. Finally, the average treatment effect and individualized treatment effect were calculated based on the model. Results In the selected 13 676 hypertensive patients, nitrendipine was better for blood pressure control than captopril (OR=1.24, 95% CI: 1.13-1.35, P=0.004). In terms of individual net effect, 98.65% of patients had better blood pressure control with nitrendipine. Conclusion TMLE can be used to analyze the average treatment effect and individualized treatment effect, which provides proof of concept for the causal inference in the real world study. -
Key words:
- Causal inference /
- Real world study /
- Super Learner /
- Targeted maximum likelihood estimation /
- Captopril /
- Nitrendipine
-
表 1 构建预测模型所用的R包
Table 1. The R packages used to construct the prediction model
预测模型 R包 决策树 rpart 多元自适应回归样条 earth 广义提升回归模型 gbm 广义线性模型 glm k-近邻算法 KernelKnn 多元自适应多项式回归样条 polspline 二次判别分析 MASS 随机森林 randomForest 递归分割树 ranger 基于AIC逐步选择模型算法 stats 梯度提升算法 xgboost 表 2 2012-2015年胶南市13 676例高血压患者的基本情况[n(%)]
Table 2. Basic characteristic of 13 676 patients with hypertension from 2012 to 2015 in Jiaonan City [n(%)]
变量 总体 卡托普利组 尼群地平组 t/χ2值 P值 年龄(x±s, 岁) 65.58±11.21 65.66±11.26 65.19±10.99 1.911 0.056 性别 6.064 0.014 男 5 368(39.25) 4 460(39.74) 908(37.03) 女 8 308(60.75) 6 764(60.26) 1 544(63.37) 职业 51.005 <0.001 农、林、牧、渔、水利业生产人员 11 327(82.82) 9 186(81.84) 2 141(87.31) 专业技术人员、国家机关、军人 117(0.86) 93(0.83) 24(0.98) 生产设备、商业服务、办事人员 242(1.77) 206(1.84) 36(1.47) 家务、离退休、无职业人员 1 000(7.31) 858(7.64) 142(5.79) 不便分类的其他人员 990(7.24) 881(7.85) 109(4.45) BMI(x±s, kg/m2) 24.83±2.99 24.84±2.95 24.83±3.18 0.119 0.906 运动状态 2.667 0.102 运动 7 457(54.53) 6 157(54.86) 1 300(53.02) 不运动 6 219(45.47) 5 067(45.14) 1 152(46.98) 吸烟状态 6.933 0.008 吸烟 2 575(18.83) 2 160(19.24) 415(16.92) 不吸烟 11 101(81.17) 9 064(80.76) 2 037(83.08) 饮酒状态 2.877 0.090 饮酒 1 774(12.97) 1 482(13.20) 292(11.91) 不饮酒 11 902(87.03) 9 742(86.80) 2 160(88.09) 血压控制情况 19.604 <0.001 已控制 4 345(31.77) 3 473(30.94) 872(35.56) 未控制 9 331(68.23) 7 751(69.06) 1 580(64.44) 表 3 Super Learner构建Q0和g0的建模权重系数
Table 3. The modeling weight coefficients of Q0 and g0 constructed by Super Learner
预测模型 Q0建模权重系数 g0建模权重系数 决策树 0.007 0.105 多元自适应回归样条 0.247 0.130 广义提升回归模型 0.191 0.000 广义线性模型 0.000 0.301 k-近邻算法 0.029 0.000 多元自适应多项式回归样条 0.000 0.145 二次判别分析 0.000 0.137 随机森林 0.066 0.000 递归分割树 0.037 0.156 基于AIC逐步选择模型算法 0.285 0.000 梯度提升算法 0.139 0.027 -
[1] Gruber S, van der Laan MJ. TMLE: an R package for targeted maximum likelihood estimation[J]. J Stat Softw, 2012, 51(13): 1-35. DOI: 10.18637/jss.v051.i13. [2] Díaz I. Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning[J]. Biostatistics, 2020, 21(2): 353-358. DOI: 10.1093/biostatistics/kxz042. [3] van der Laan MJ, Rubin DB. Targeted maximum likelihood learning[J]. Int J Biostat, 2006, 2(1): Article 11. DOI: 10.2202/1557-4679.1043. [4] Schuler MS, Rose S. Targeted maximum likelihood estimation for causal inference in observational studies[J]. Am J Epidemiol, 2017, 185(1): 65-73. DOI: 10.1093/aje/kww165. [5] Rubin DB. Statistics and causal inference: comment: which ifs have causal answers[J]. J Am Stat Assoc, 1986, 81(396): 961-962. DOI: 10.2307/2289065. [6] van der Laan MJ, Polley EC, Hubbard AE. Super learner[J]. Stat Appl Genet Mol Biol, 2007, 6(1): Article25. DOI: 10.2202/1544-6115.1309. [7] Rubin DB. Estimating causal effects of treatments in randomized and non-randomized studies[J]. J Educ Psychol, 1974, 66(5): 688-701. DOI: 10.1037/h0037350. [8] Rubin DB. Randomization analysis of experimental data: the fisher randomization test comment[J]. J Am Stat Assoc, 1980, 75(371): 591-593. DOI: 10.2307/2287652. [9] Heitjan DF, Rubin DB. Ignorability and coarse data. [J]. Ann Stat, 1991, 19(4): 2244-2253. DOI: 10.1214/aos/1176348396. [10] 伍三妹, 屈智明, 何月光, 等. 尼群地平、卡托普利对不同年龄段原发性高血压降压疗效影响的对比研究[J]. 中国实用医药, 2009, 4(22): 166-167. DOI: 10.3969/j.issn.1673-7555.2009.22.133.Wu SM, Qu ZM, He YG, et al. Comparative study on the effect of nitrendipine and captopril of essential hypertension in different age groups[J]. China Pract Med, 2009, 4(22): 166-167. DOI: 10.3969/j.issn.1673-7555.2009.22.133. [11] van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data[M]. New York: Springer-Verlag New York, 2011. [12] Ferreira Guerra S, Schnitzer ME, Forget A, et al. Impact of discretization of the timeline for longitudinal causal inference methods[J]. Stat Med, 2020, 39(27): 4069-4085. DOI: 10.1002/sim.8710. [13] Schomaker M, Luque-Fernandez MA, Leroy V, et al. Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions[J]. Stat Med, 2019, 38(24): 4888-4911. DOI: 10.1002/sim.8340. [14] Schnitzer ME, Sango J, Ferreira Guerra S, et al. Data-adaptive longitudinal model selection in causal inference with collaborative targeted minimum loss-based estimation[J]. Biometrics, 2020, 76(1): 145-157. DOI: 10.1111/biom.13135.