학술논문

Risk Prediction of Diabetes Progression Using Big Data Mining with Multifarious Physical Examination Indicators
Document Type
article
Source
Diabetes, Metabolic Syndrome and Obesity, Vol Volume 17, Pp 1249-1265 (2024)
Subject
prediabetes
prediction model
physical examination
machine learning
regression analysis
Specialties of internal medicine
RC581-951
Language
English
ISSN
1178-7007
Abstract
Xiaohong Chen,1,* Shiqi Zhou,2,* Lin Yang,1 Qianqian Zhong,1 Hongguang Liu,3 Yongjian Zhang,1 Hanyi Yu,2 Yongjiang Cai1 1Center of Health Management, Peking University Shenzhen Hospital, Shenzhen, People’s Republic of China; 2School of Future Technology, South China University of Technology, Guangzhou, People’s Republic of China; 3Center of Health Management, Huazhong University of Science and Technology Union Hospital (Nanshan Hospital), Shenzhen, People’s Republic of China*These authors contributed equally to this workCorrespondence: Yongjiang Cai, Center of Health Management, Peking University Shenzhen Hospital, Shenzhen, 518036, People’s Republic of China, Email caiyj2000@sina.cn Hanyi Yu, School of Future Technology, South China University of Technology, Guangzhou, People’s Republic of China, Email yuhanyi@scut.edu.cnPurpose: The purpose of this study is to explore the independent-influencing factors from normal people to prediabetes and from prediabetes to diabetes and use different prediction models to build diabetes prediction models.Methods: The original data in this retrospective study are collected from the participants who took physical examinations in the Health Management Center of Peking University Shenzhen Hospital. Regression analysis is individually applied between the populations of normal and prediabetes, as well as the populations of prediabetes and diabetes, for feature selection. Afterward,the independent influencing factors mentioned above are used as predictive factors to construct a prediction model.Results: Selecting physical examination indicators for training different ML models through univariate and multivariate logistic regression, the study finds Age, PRO, TP, and ALT are four independent risk factors for normal people to develop prediabetes, and GLB and HDL.C are two independent protective factors, while logistic regression performs best on the testing set (Acc: 0.76, F-measure: 0.74, AUC: 0.78). We also find Age, Gender, BMI, SBP, U.GLU, PRO, ALT, and TG are independent risk factors for prediabetes people to diabetes, and AST is an independent protective factor, while logistic regression performs best on the testing set (Acc: 0.86, F-measure: 0.84, AUC: 0.74).Conclusion: The discussion of the clinical relationships between these indicators and diabetes supports the interpretability of our feature selection. Among four prediction models, the logistic regression model achieved the best performance on the testing set.Keywords: prediabetes, prediction model, physical examination, machine learning, regression analysis