Application of Logistic regression and decision tree analysis in early warning indicators of hypertension and diabetes comorbidity
-
摘要:
目的 分析湖北省成年人高血压糖尿病共患病(hypertension-diabetes comorbidity,HDC)状况及其预警指标,为HDC的预防控制提供科学依据。 方法 采用多阶段分层抽样法,从湖北省11个地区抽取≥18岁的居民作为研究对象,对其进行问卷调查,采用Logistic回归分析模型和决策树模型分析HDC的预警指标,并用受试者工作曲线(receiver operating characteristic calve,ROC)评价两种模型的预测效果。 结果 Logistic回归分析模型及决策树模型分析结果均显示,工作强度、吸烟、婚姻、性别、BMI和年龄均是HDC的有效预警指标(均有P < 0.05)。Logistic回归分析模型ROC曲线下面积大于决策树模型(0.967 vs. 0.933, Z=9.199,P<0.001)。 结论 Logistic回归分析模型预测能力优于决策树模型。但有必要将两种模型结合应用于HDC预警指标的预测。通过Logistic回归分析模型筛选出有意义的主效应预警指标,然后采用决策树模型进一步分析指标间的交互作用,为HDC的防控提供参考依据。 Abstract:Objective To study the status and warning indicators of hypertension-diabetes comorbidity (HDC) among adults in Hubei Province, so as to provide a scientific basis for the prevention and control of HDC. Methods A cross-sectional survey was conducted by using the multi-stage stratified random sampling method among residents aged ≥18 years from 11 districts of Hubei Province. Logistic regression model and decision tree model were used to analyze the warning indicators of HDC. Receiver operating characteristic (ROC) curve was used to evaluate the prediction effects of the two models. Results Both Logistic regression model and decision tree model showed that work intensity, smoking, marriage, gender, BMI and age were the warning indicators of HDC (all P < 0.05). The area under ROC curve of Logistic regression model was larger than that of decision tree model (0.967 vs. 0.933, Z=9.199, P < 0.001). Conclusions The predictive ability of the Logistic regression model is better than that of the decision tree model. However, it is essential to combine these two different methods to describe the warning indicators of HDC. Firstly, significant main effect of warning indicators should be screened out through Logistic regression. Then, the interaction between indicators should be further analyzed by using the decision tree model, so as to provide reference for the prevention and control of HDC. -
Key words:
- Hypertension /
- Diabetes /
- Comorbidity /
- Warning indicator /
- Logistic regression /
- Decision tree model
-
表 1 HDC影响因素的单因素分析[n(%)]
Table 1. Single factor analysis of HDC influencing factors [n(%)]
变量 合计 HDC (n= 713) 非HDC (n= 24 643) χ2值 P值 性别 18.49 < 0.001 男 12 214(48.2) 400(56.1) 11 814(47.9) 女 13 142(51.8) 313(43.9) 12 829(52.1) 年龄(岁) 386.08 < 0.001 18~ < 40 10 130(40.0) 34(4.8) 10 096(41.0) 40~ < 60 8 717(34.4) 361(50.6) 8 356(33.9) ≥60 6 509(25.7) 318(44.6) 6 191(25.1) 婚姻状况 191.61 < 0.001 未婚 2 012(7.9) 77(10.8) 1 935(7.8) 已婚 21 328(84.1) 485(68.0) 2 0843(84.6) 离婚/丧偶/分居 2 016(8.0) 151(21.2) 1 865(7.6) 文化程度 361.08 < 0.001 文盲 2 807(11.1) 232(32.5) 2 575(10.4) 小学 7 523(29.7) 179(25.1) 7 344(29.8) 初中 8 515(33.6) 210(29.5) 8 305(33.7) 高中 3 368(13.3) 45(6.3) 3 323(13.5) 大专及以上 3 143(12.4) 47(6.6) 3 096(12.6) 家庭人均月收入(元) 45.94 < 0.001 <1 000 6 074(24.0) 208(29.2) 5 866(23.8) 1 000~<1 500 8 004(31.6) 274(38.4) 7 730(31.4) 1 500~<2 000 8 639(34.1) 165(23.1) 8 474(34.4) ≥2 000 2 639(10.4) 66(9.3) 2 573(10.4) 吸烟 是 4 384(23.1) 402(56.4) 3 982(16.2) 783.97 < 0.001 否 20 972(81.5) 311(43.6) 20 661(83.8) 饮酒 131.80 < 0.001 是 6 681(26.3) 321(45.0) 6 360(25.8) 否 18 675(73.7) 392(55.0) 18 283(74.2) 每天静态行为时间(h) 467.72 < 0.001 < 4 16 427(64.8) 190(26.7) 16 237(65.9) ≥4 8 929(35.2) 523(73.3) 8 406(34.1) 知晓食盐摄入量影响健康 144.55 < 0.001 是 15 962(63.0) 296(41.5) 15 666(63.6) 否 9 394(37.0) 417(58.5) 8 977(36.4) WHtR范围 417.81 < 0.001 < P25 6 421(25.3) 58(8.1) 6 363(25.8) P25~ < P50 6 307(24.9) 72(10.1) 6 235(25.3) P50~ < P75 6 553(25.8) 202(28.3) 6 351(25.8) ≥P75 6 075(24.0) 381(53.5) 5 694(23.1) 知晓每人每天食盐摄入标准 20.50 < 0.001 是 5 277(20.8) 100(14.0) 5 177(21.0) 否 20 079(79.2) 613(86.0) 19 466(79.0) 知晓每人每天食用油摄入标准 5.37 0.020 是 4 568(18.0) 105(14.7) 4 463(18.1) 否 20 788(82.0) 608(85.3) 20 180(81.9) 每天食盐摄入量(g/d) 23.47 < 0.001 < 6 1 713(81.5) 25(3.5) 1 688(6.8) 6~<12 2 055(18.5) 37(5.2) 2 018(8.2) 12~<18 8 056(18.5) 231(32.4) 7 825(31.8) ≥18 13 532(18.5) 420(58.9) 13 112(53.2) 身体锻炼 19.71 < 0.001 是 4 833(19.1) 90(12.6) 4 743(19.3) 否 20 523(80.9) 623(87.4) 19 900(80.7) 知晓慢性病的风险标准 33.43 < 0.001 是 7 048(27.8) 130(18.2) 6 918(28.1) 否 18 308(72.2) 583(81.8) 17 725(71.9) WC 276.58 < 0.001 正常 13 528(53.4) 162(22.7) 13 366(54.2) 超出正常范围 11 828(46.7) 551(77.3) 11 277(45.8) BMI(kg/m2) 682.87 < 0.001 < 18.5 2 304(9.1) 29(4.1) 2 275(9.2) 18.5~<24 15 023(59.3) 174(24.4) 14 849(60.3) 24~<27 5 816(22.9) 290(40.7) 5 526(22.4) ≥27 2 213(8.7) 220(30.8) 1 993(8.1) 工作强度 3 000.00 < 0.001 高 1 428(5.6) 370(51.9) 1 058(4.3) 中等 11 429(45.1) 277(38.9) 11 152(45.3) 低 12 499(49.3) 66(9.3) 12 433(50.4) 表 2 HDC预警指标的Logistic多因素回归分析
Table 2. Multivariate Logistic regression analysis on the warning indicators of HDC
变量 β值 sx值 Wald值 OR值 (95% CI)值 P值 下限 上限 性别(参照:女) 男 -0.557 0.123 20.577 0.57 0.45 0.73 < 0.001 年龄(岁)(参照:≥60) 18~<40 -2.594 0.219 140.648 0.08 0.05 0.12 < 0.001 40~<60 -0.236 0.117 4.060 0.79 0.63 0.99 0.044 婚姻(参照:离婚/丧偶/分居) 未婚 -1.547 0.224 47.645 0.20 0.14 0.33 < 0.001 已婚 -2.661 0.163 268.111 0.07 0.05 0.10 < 0.001 文化程度(参照:大专及以上) 文盲 0.404 0.255 2.510 1.50 0.91 2.47 0.113 小学 -0.375 0.243 2.381 0.69 0.43 1.11 0.123 初中 -0.442 0.240 3.401 0.64 0.40 1.03 0.065 高中 -0.255 0.286 0.798 0.78 0.44 1.36 0.372 工作强度(参照:低强度) 高强度 4.632 0.174 710.923 102.68 73.05 144.33 < 0.001 中等强度 1.838 0.157 136.515 6.28 4.62 8.55 < 0.001 家庭人均月收入(参照:≥2 000元) < 1 000元 -0.075 0.213 0.124 0.93 0.61 1.41 0.725 1 000~<1 500元 0.327 0.205 2.546 1.39 0.93 2.07 0.111 1 500~<2 000元 -0.258 0.210 1.508 0.77 0.51 1.17 0.219 吸烟(参照:是) 否 2.133 0.133 257.559 8.44 6.51 10.96 < 0.001 饮酒(参照:是) 否 0.343 0.125 7.521 1.41 1.10 1.80 0.006 锻炼(参照:是) 否 0.380 0.154 6.063 1.46 1.08 1.98 0.014 每天静态行为时间(参照: < 4 h) ≥4 h 1.930 0.119 262.023 6.86 5.43 8.66 < 0.001 知晓食盐摄入量影响健康(参照: 是) 否 1.030 0.120 73.334 2.80 2.21 3.55 < 0.001 知晓每天食盐量摄入标准(参照:是) 否 0.526 0.186 8.023 1.69 1.18 2.44 0.005 每天食盐量(参照:>18 g) < 6 g 0.158 0.310 0.259 1.17 0.64 2.15 0.611 6~<12 g 0.124 0.251 0.242 1.13 0.69 1.85 0.623 12~<18 g 0.116 0.224 0.224 1.12 0.70 1.81 0.636 知晓每天食油量摄入标准(参照:是) 否 0.086 0.173 0.244 1.09 0.78 1.53 0.621 BMI(参照:18.5~<24 kg/m2) < 18.5 kg/m2 0.355 0.233 2.326 1.43 0.90 2.25 0.127 24~<27.0 kg/m2 1.703 0.139 151.043 5.49 4.19 7.21 < 0.001 ≥27.0 kg/m2 2.084 0.158 174.807 8.04 5.90 10.95 < 0.001 知晓慢性病风险标准(参照:是) 否 0.152 0.139 1.196 1.17 0.89 1.53 0.274 WHtR(参照: < P25) P25~ < P50 0.508 0.229 4.924 1.66 1.06 2.60 0.026 P50~ < P75 0.941 0.225 17.466 2.56 1.65 3.99 < 0.001 ≥P75 1.205 0.236 25.956 3.34 2.10 5.30 < 0.001 WC(参照:正常) 超出正常标围内 0.107 0.164 0.426 1.11 0.81 1.53 0.514 常量 -7.003 0.505 192.475 0.001 < 0.001 -
[1] Liu J, Zhao D, Liu J, et al. Prevalence of diabetes mellitus in outpatients with essential hypertension in China: a cross-sectional study[J]. BMJ Open, 2013, 3(11): e003798. DOI: 10.1136/bmjopen-2013-003798. [2] Chobanian AV, Bakris GL, Black HR, et al. Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure[J]. Hypertension, 2003, 42(6): 1206-1252. DOI: 10.1016/S1062-1458(03)00270-8. [3] 王冬燕. Logistic回归与决策树分类效能的ROC曲线比较[J]. 智能计算机与应用, 2014, 4(5): 34-36. DOI: 10.3969/j.issn.2095-2163.2014.05.010.Wang DY. The ROC curves comparing of classification performance between Logistic regression and decision tree[J]. Intelligent Computer and Applications, 2014, 4(5): 34-36. DOI: 10.3969/j.issn.2095-2163.2014.05.010. [4] 李现文, 李春玉, Kim M, 等. 决策树与Logistic回归在高血压患者健康素养预测中的应用[J]. 护士进修杂志, 2012, 27(13): 1157-1159. DOI: 10.3969/j.issn.1002-6975.2012.13.002.Li XW, Li CY, Kim M, et al. Application of decision tree and Logistic regression on the health literacy prediction of hypertension patients[J]. J Nurs Train, 2012, 27(13): 1157-1159. DOI: 10.3969/j.issn.1002-6975.2012.13.002. [5] Zhang FL, Guo ZN, Wu YH, et al. Prevalence of stroke and associated risk factors: a population based cross sectional study from northeast China[J]. BMJ Open, 2017, 7(9): e015758. DOI: 10.1136/bmjopen-2016-015758. [6] 范雷, 李少芳, 韩冰, 等. 河南省15~74岁人群高血压合并糖尿病流行特征分析[J]. 当代医学, 2015, 21(3): 161-163. DOI: 10.3969/j.issn.1009-4393.2015.3.106.Fan L, Li SF, Han B, et al. Analysis on the epidemiological characteristics of hypertension complicated with diabetes among people aged 15-74 years in Henan province[J]. Contemp Med, 2015, 21(3): 161-163. DOI: 10.3969/j.issn.1009-4393.2015.3.106. [7] Wang J, Yang Y, Zhu J, et al. Overweight is associated with improved survival and outcomes in patients with atrial fibrillation[J]. Clin Res Cardiol, 2014, 103(7): 533-542. DOI: 10.1007/s00392-014-0681-7. [8] 中华医学会糖尿病学分会. 中国2型糖尿病防治指南(2013年版)[J]. 中华糖尿病杂志, 2014, 30(8): 893-942. DOI: 10.3760/cma.j.issn.1000-6699.2014.10.020.Diabetes Society of Chinese Medical Association. Guidelines for the prevention and treatment of type 2 diabetes in China (2013 edition)[J]. Chin J Diabetes Mellitus, 2014, 30(8): 893-942. DOI: 10.3760/cma.j.issn.1000-6699.2014.10.020. [9] 中国高血压防治指南修订委员会, 中国高血压联盟, 中华医学会心血管病学分会中国医师协会高血压专业委员会, 等. 中国高血压防治指南(2018年修订版)[J]. 中国心血管杂志, 2019, 24(1): 24-56. DOI: 10.3969/j.issn.1007-5410.2019.01.002.Writing Group of Chinese Guidelines for the Management of Hypertension, Chinese Hypertension League, Chinese Society of Cardiology, Chinese Medical Doctor Association Hypertension Committee, et al. 2018 Chinese guidelines for the management of hypertension[J]. Chin J Cardiovasc Med, 2019, 24(1): 24-56. DOI: 10.3969/j.issn.1007-5410.2019.01.002. [10] 佟明坤, 满塞丽麦, 金成, 等. 千万例体检人群高血压患病率、知晓率、治疗率和控制率的调查[J]. 中国循环杂志, 2020, 35(9): 866-872. DOI: 10.3969/j.issn.1000-3614.2020.09.004.Tong MK, Man SLM, Jin C, et al. Prevalence, awareness, treatment and control of hypertension in China: survey on a 10 million health check-up population[J]. Chin Circul J, 2020, 35(9): 866-872. DOI: 10.3969/j.issn.1000-3614.2020.09.004. [11] 张杜丹, 唐迅, 靳丹瑶, 等. 中国成年人糖尿病患病率Meta分析[J]. 中华流行病学杂志, 2018, 39(6): 852-857. DOI: 10.3760/cma.j.issn.0254-6450.2018.06.030.Zhang DD, Tang X, Jin DY, et al. Prevalence of diabetes in Chinese adults: a Meta-analysis[J]. Chin J Epidemiol, 2018, 39(6): 852-857. DOI: 10.3760/cma.j.issn.0254-6450.2018.06.030. [12] Cohen S, Janicki-Deverts D, Miller GE. Psychological stress and disease[J]. JAMA, 2007, 298(14): 1685-1687. DOI: 10.1001/jama.298.14.1685 [13] Chen C, Tu YQ, Yang P, et al. Assessing the impact of cigarette smoking on β-cell function and risk for type 2 diabetes in a non-diabetic Chinese cohort[J]. Am J Transl Res, 2018, 10(7): 2164-2174. [14] Gullu H, Caliskan M, Ciftci O, et al. Light cigarette smoking impairs coronary microvascular functions as severely as smoking regular cigarettes[J]. Heart, 2007, 93(10): 1274-1277. DOI: 10.1136/hrt.2006.100255. [15] 王午喜, 屈宗杰, 朱爱冬. 重庆市社区10 932名普通居民糖尿病流行病学调查分析[J]. 重庆医学, 2013, 42(26): 3149-3150. DOI: 10.3969/j.issn.1671-8348.2013.26.027.Wang WX, Qu ZJ, Zhu AD. Epidemiologic analysis of diabetes among 10 932 common residents in Chongqing communities[J]. Chongqing Medicine, 2013, 42(26): 3149-3150. DOI: 10.3969/j.issn.1671-8348.2013.26.027. [16] Larsen CM, Faulenbach M, Vaag A, et al. Interleukin-1-receptor antagonist in type 2 diabetes mellitus[J]. N Engl J Med, 2007, 356(15): 1517-1526. DOI: 10.1056/nejmc071324. [17] 李影, 闫鹏, 董平栓. 胰岛素抵抗的分子学机制[J]. 医学综述, 2014, 20(17): 3122-3124. DOI: 10.3969/j.issn.1006-2084.2014.17.019.Li Y, Yan P, Dong PS. Molecular mechanism of insulin resistance[J]. Medical Recapitulate, 2014, 20(17): 3122-3124. DOI: 10.3969/j.issn.1006-2084.2014.17.019. [18] 苏健, 吕淑荣, 杨婕, 等. 江苏省成人脂质蓄积指数与高血压和糖尿病患病风险关系的研究[J]. 中华疾病控制杂志, 2018, 22(3): 217-221. DOI: 10.16462/j.cnki.zhjbkz.2018.03.002.Su J, Lv SR, Yang J, et al. Relationship between lipid accumulation product and the risk of hypertension and diabetes in adults of Jiangsu Province[J]. Chin J Dis Control Prev, 2018, 22(3): 217-221. DOI: 10.16462/j.cnki.zhjbkz.2018.03.002. [19] 帅健, 李丽萍, 陈业群. 决策树模型与Logistic回归模型在伤害发生影响因素分析中的作用[J]. 中华疾病控制杂志, 2015, 19(2): 185-189. DOI: 10.16462/j.cnki.zhjbkz.2015.02.021.Shuai J, Li LP, Chen YQ, et al. The role of Decision tree model and Logistic regression in injury influencing factors analysis[J]. Chin J Dis Control Prev, 2015, 19(2): 185-189. DOI: 10.16462/j.cnki.zhjbkz.2015.02.021. -