Predictive Data Mining Approaches for Diabetes Mellitus Type II Disease
DOI:
https://doi.org/10.56225/ijgoia.v1i2.22Keywords:
data mining, diabetes mellitus, Naïve Bayes, artificial neural network, logistic regression, decision treeAbstract
Diabetes is among the major public health problem especially in developing countries which cause by abnormal insulin secretion in human body. It is a common disease that can led to several health complications and mortality. In Malaysia, most of the cases are categorized as Diabetes Mellitus (DM) Type II. Patients with diabetes increases from year to year due to unhealthy lifestyles e.g. smoking, overweight and hypertension. Therefore, this study meant to identify the influential factors that may contribute to DM Type II by comparing the performance of different data mining approaches. Between April 2017 and November 2018, 684 patients from a public clinic participated in this retrospective cross-sectional study. Four predictive models involved in the study are Logistic Regression, Decision Tree, Naïve Bayes, and Artificial Neural Network (ANN). The error measures (Average Squared Error and Misclassification Rate) with ROC Index are used to evaluate the performance of the models. Results show that the performance of Logistic Regression-Stepwise outperformed to other predictive models with classification accurateness of 73% and able to predict positive outcome (Y=1) correctly by 90%. The significant inputs that affect DM Type II prediction (Y=1) are Hypertension and Glycated Hemoglobin (HbA1c) given the Root Mean Squared Error (RMSE) of model is 0.424. The importance of study may be able to contribute in improving the strategies and planning on diabetes diseases in Malaysia.
References
Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics, 2(3), 370–374.
Anitha, S., & Sridevi, N. (2019). Heart disease prediction using data mining techniques. Journal of Analysis and Computation, 8(2), 48–55.
Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics, 35(5–6), 352–359.
Esmaily, H., Tayefi, M., Doosti, H., Ghayour-Mobarhan, M., Nezami, H., & Amirabadizadeh, A. (2018). A comparison between decision tree and random forest in determining the risk factors associated with type 2 diabetes. Journal of Research in Health Sciences, 18(2), 412.
Hussein, Z., Taher, S. W., Singh, H. K. G., & Swee, W. C. S. (2015). Diabetes care in Malaysia: problems, new models, and solutions. Annals of Global Health, 81(6), 851–862.
Kaur, H., & Wasan, S. K. (2006). Empirical study on applications of data mining techniques in healthcare. Journal of Computer Science, 2(2), 194–200.
Kazemnejad, A., Batvandi, Z., & Faradmal, J. (2010). Comparison of artificial neural network and binary logistic regression for determination of impaired glucose tolerance/diabetes. EMHJ-Eastern Mediterranean Health Journal, 16 (6), 615-620, 2010. https://doi.org/10.5829/idosi.wasj.2013.23.11.1119
Meng, X.-H., Huang, Y.-X., Rao, D.-P., Zhang, Q., & Liu, Q. (2013). Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. The Kaohsiung Journal of Medical Sciences, 29(2), 93–99. https://doi.org/10.1016/j.kjms.2012.08.016.
Okwechime, I. O., Roberson, S., & Odoi, A. (2015). Prevalence and predictors of pre-diabetes and diabetes among adults 18 years or older in Florida: a multinomial logistic modeling approach. PloS One, 10(12), 1–17. https://doi.org/10.1371/journal.pone.0145781
Steyerberg, E. W., Harrell Jr, F. E., Borsboom, G. J. J. M., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology, 54(8), 774–781.
Wah, Y. B. (2006). Some applications of data mining. National Statistics.
Wah, Y. B., Ismail, N. H., & Fong, S. (2011). Predicting car purchase intent using data mining approach. 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 3, 1994–1999.
World Health Organization. (2018). Diabetes. available at https://www.who
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 by the authors
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright @2022. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) which permits unrestricted to copy and redistribute the material in any medium or format, remix, transform, and build upon the material for any purpose, even commercially.
This work is licensed under a Creative Commons Attribution 4.0 International License.