Advanced Search

CN 34-1304/RISSN 1674-3679

Volume 27 Issue 3
Mar.  2023
Turn off MathJax
Article Contents
WANG Wen-jie, MA Jin-sha, GAO Qian, WANG Tong. Variable selection methods based on variable importance measurement from random forest and its application in diagnosis of tumor typing[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2023, 27(3): 274-280. doi: 10.16462/j.cnki.zhjbkz.2023.03.005
Citation: WANG Wen-jie, MA Jin-sha, GAO Qian, WANG Tong. Variable selection methods based on variable importance measurement from random forest and its application in diagnosis of tumor typing[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2023, 27(3): 274-280. doi: 10.16462/j.cnki.zhjbkz.2023.03.005

Variable selection methods based on variable importance measurement from random forest and its application in diagnosis of tumor typing

doi: 10.16462/j.cnki.zhjbkz.2023.03.005
Funds:

National Natural Science Foundation of China 81872715

National Natural Science Foundation of China 82073674

Major Science and Technology Project of Shanxi Province 202005D121008

Major Science and Technology Project of Shanxi Province 202102130501003

More Information
  • Corresponding author: WANG Tong, E-mail: tongwang@sxmu.edu.cn
  • Received Date: 2022-02-18
  • Rev Recd Date: 2022-05-23
  • Available Online: 2023-04-04
  • Publish Date: 2023-03-10
  •   Objective  To explore the variable selecting methods based on variable importance measurement from random forest (RF) for binary outcome in the high-dimensional omics data, and to choose the appropriate methods to construct the outcome prediction model.  Methods  First, according to the different variable selection objectives, we simulated and compared the ability of minimum optimized variable selection RF methods [recursive feature elimination (RFE)-RF, biosigner] and all relevant variable selection RF methods (Boruta, vita, altmann and r2vim) to identify important variables in high-dimensional data. Then we combined different methods to select genes related to diffuse large B cell lymphoma (DLBCL) classification and constructed the model for diffuse large B cell lymphoma classification diagnosis.  Results  Simulation study showed that vita had higher sensitivity, and biosigner had higher positive predictive value. Empirical study showed that a total of 1 019 genes related to DLBCL classification were obtained by vita method, and 77 genes related to DLBCL classification were obtained by biosigner method. The area under the receiver operating characteristical (ROC) curve (AUC) of the DLBCL typing diagnostic model was 0.910.  Conclusions  Vita and biosigner can be used in the preliminary and final selecting stages of genes related to DLBCL classification. The model we developed can effectively distinguish the different subtypes of DLBCL.
  • loading
  • [1]
    Kohavi R, John GH. Wrappers for feature subset selection[J]. Artif Intell, 1997, 97(1-2): 273-324. DOI: 10.1016/S0004-3702(97)00043-X.
    [2]
    Tang YC, Zhang YQ, Huang Z. Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis[J]. IEEE/ACM Trans Comput Biol Bioinform, 2007, 4(3): 365-381. DOI: 10.1109/TCBB.2007.70224.
    [3]
    Stańczyk U, Jain LC. Feature selection for data and pattern recognition[M]. Berlin: Springer, 2015: 12-14.
    [4]
    Nilsson R, Peña JM, Björkegren J, et al. Detecting multivariate differentially expressed genes[J]. BMC Bioinformatics, 2007, 8: 150. DOI: 10.1186/1471-2105-8-150.
    [5]
    Ein-Dor L, Kela I, Getz G, et al. Outcome signature genes in breast cancer: is there a unique set?[J]. Bioinformatics, 2005, 21(2): 171-178. DOI: 10.1093/bioinformatics/bth469.
    [6]
    张鼎, 赵亚双. 生物信息学在分子流行病学中的应用[J]. 中华疾病控制杂志, 2021, 25(1): 20-24. DOI: 10.16462/j.cnki.zhjbkz.2021.01.005.

    Zhang D, Zhao YS. Applications of bioinformatics in molecular epidemiology[J]. Chin J Dis Control Prev, 2021, 25(1): 20-24. DOI: 10.16462/j.cnki.zhjbkz.2021.01.005.
    [7]
    Breiman L. Random forests[J]. Mach Learn, 2001, 45(1): 5-32. DOI: 10.1023/A:1010933404324.
    [8]
    Nicodemus KK, Malley JD, Strobl C, et al. The behaviour of random forest permutation-based variable importance measures under predictor correlation[J]. BMC Bioinformatics, 2010, 11: 110. DOI: 10.1186/1471-2105-11-110.
    [9]
    Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets[J]. Brief Bioinform, 2019, 20(2): 492-503. DOI: 10.1093/bib/bbx124.
    [10]
    Wu XY, Wu ZY, Li K. Identification of differential gene expression for microarray data using recursive random forest[J]. Chin Med J (Engl), 2008, 121(24): 2492-2496. DOI: 10.3238/arztebl.2008.0900a.
    [11]
    Rinaudo P, Boudah S, Junot C, et al. Biosigner: a new method for the discovery of significant molecular signatures from omics data[J]. Front Mol Biosci, 2016, 3: 26. DOI: 10.3389/fmolb.2016.00026.
    [12]
    Kursa MB, Jankowski A, Rudnicki WR. Boruta-a system for feature selection[J]. Fundam Informaticae, 2010, 101(4): 271-285. DOI: 10.3233/fi-2010-288.
    [13]
    Altmann A, Toloşi L, Sander O, et al. Permutation importance: a corrected feature importance measure[J]. Bioinformatics, 2010, 26(10): 1340-1347. DOI: 10.1093/bioinformatics/btq134.
    [14]
    Szymczak S, Holzinger E, Dasgupta A, et al. r2VIM: a new variable selection method for random forests in genome-wide association studies[J]. BioData Min, 2016, 9: 7. DOI: 10.1186/s13040-016-0087-3.
    [15]
    Janitza S, Celik E, Boulesteix AL. A computationally fast variable importance test for random forests for high-dimensional data[J]. Adv Data Anal Classif, 2018, 12(4): 885-915. DOI: 10.1007/s11634-016-0276-4.
    [16]
    Shin M, Bhattacharya A, Johnson VE. Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings[J]. Stat Sin, 2018, 28(2): 1053-1078. DOI: 10.5705/ss.202016.0167.
    [17]
    Yan WH, Jiang XN, Wang WG, et al. Cell-of-origin subtyping of diffuse large B-cell lymphoma by using a qPCR-based gene expression assay on formalin-fixed paraffin-embedded tissues[J]. Front Oncol, 2020, 10: 803. DOI: 10.3389/fonc.2020.00803.
    [18]
    Alizadeh AA, Eisen MB, Davis RE, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling[J]. Nature, 2000, 403(6769): 503-511. DOI: 10.1038/35000501.
    [19]
    Reif DM, Motsinger-Reif AA, McKinney BA, et al. Integrated analysis of genetic and proteomic data identifies biomarkers associated with adverse events following smallpox vaccination[J]. Genes Immun, 2009, 10(2): 112-119. DOI: 10.1038/gene.2008.80.
    [20]
    Salzer U, Chapel HM, Webster AD, et al. Mutations in TNFRSF13B encoding TACI are associated with common variable immunodeficiency in humans[J]. Nat Genet, 2005, 37(8): 820-828. DOI: 10.1038/ng1600.
    [21]
    Kralickova P, Milota T, Litzman J, et al. CVID-associated tumors: Czech nationwide study focused on epidemiology, immunology, and genetic background in a cohort of patients with CVID[J]. Front Immunol, 2019, 9: 3135. DOI: 10.3389/fimmu.2018.03135.
    [22]
    Inamo J, Suzuki K, Takeshita M, et al. Identification of novel genes associated with dysregulation of B cells in patients with primary Sjögren's syndrome[J]. Arthritis Res Ther, 2020, 22(1): 153. DOI: 10.1186/s13075-020-02248-2.
    [23]
    Liu JQ, Yao YL, Hu ZY, et al. Transcriptional profiling of long-intergenic noncoding RNAs in lung squamous cell carcinoma and its value in diagnosis and prognosis[J]. Mol Genet Genomic Med, 2019, 7(12): e994. DOI: 10.1002/mgg3.994.
    [24]
    Blenk S, Engelmann J, Weniger M, et al. Germinal center B cell-like (GCB) and activated B cell-like (ABC) type of diffuse large B cell lymphoma (DLBCL): analysis of molecular predictors, signatures, cell cycle state and patient survival[J]. Cancer Inform, 2007, 3: 399-420. DOI: 10.1177/117693510700300004.
    [25]
    Wood O, Woo J, Seumois G, et al. Gene expression analysis of TIL rich HPV-driven head and neck tumors reveals a distinct B-cell signature when compared to HPV independent tumors[J]. Oncotarget, 2016, 7(35): 56781-56797. DOI: 10.18632/oncotarget.10788.
    [26]
    Coutinho R, Clear AJ, Owen A, et al. Poor concordance among nine immunohistochemistry classifiers of cell-of-origin for diffuse large B-cell lymphoma: implications for therapeutic strategies[J]. Clin Cancer Res, 2013, 19(24): 6686-6695. DOI: 10.1158/1078-0432.CCR-13-1482.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)

    Article Metrics

    Article views (816) PDF downloads(100) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return