Predictive Data Mining Approaches for Diabetes Mellitus Type II Disease

Authors

  • Shahira Ibrahim Department of Statistics and Decision Sciences, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia
  • Siti Shaliza Mohd Khairi Department of Statistics and Decision Sciences, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia

DOI:

https://doi.org/10.56225/ijgoia.v1i2.22

Keywords:

data mining, diabetes mellitus, Naïve Bayes, artificial neural network, logistic regression, decision tree

Abstract

Diabetes is among the major public health problem especially in developing countries which cause by abnormal insulin secretion in human body. It is a common disease that can led to several health complications and mortality. In Malaysia, most of the cases are categorized as Diabetes Mellitus (DM) Type II. Patients with diabetes increases from year to year due to unhealthy lifestyles e.g. smoking, overweight and hypertension. Therefore, this study meant to identify the influential factors that may contribute to DM Type II by comparing the performance of different data mining approaches. Between April 2017 and November 2018, 684 patients from a public clinic participated in this retrospective cross-sectional study. Four predictive models involved in the study are Logistic Regression, Decision Tree, Naïve Bayes, and Artificial Neural Network (ANN). The error measures (Average Squared Error and Misclassification Rate) with ROC Index are used to evaluate the performance of the models. Results show that the performance of Logistic Regression-Stepwise outperformed to other predictive models with classification accurateness of 73% and able to predict positive outcome (Y=1) correctly by 90%. The significant inputs that affect DM Type II prediction (Y=1) are Hypertension and Glycated Hemoglobin (HbA1c) given the Root Mean Squared Error (RMSE) of model is 0.424. The importance of study may be able to contribute in improving the strategies and planning on diabetes diseases in Malaysia.

References

Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics, 2(3), 370–374.

Anitha, S., & Sridevi, N. (2019). Heart disease prediction using data mining techniques. Journal of Analysis and Computation, 8(2), 48–55.

Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics, 35(5–6), 352–359.

Esmaily, H., Tayefi, M., Doosti, H., Ghayour-Mobarhan, M., Nezami, H., & Amirabadizadeh, A. (2018). A comparison between decision tree and random forest in determining the risk factors associated with type 2 diabetes. Journal of Research in Health Sciences, 18(2), 412.

Hussein, Z., Taher, S. W., Singh, H. K. G., & Swee, W. C. S. (2015). Diabetes care in Malaysia: problems, new models, and solutions. Annals of Global Health, 81(6), 851–862.

Kaur, H., & Wasan, S. K. (2006). Empirical study on applications of data mining techniques in healthcare. Journal of Computer Science, 2(2), 194–200.

Kazemnejad, A., Batvandi, Z., & Faradmal, J. (2010). Comparison of artificial neural network and binary logistic regression for determination of impaired glucose tolerance/diabetes. EMHJ-Eastern Mediterranean Health Journal, 16 (6), 615-620, 2010. https://doi.org/10.5829/idosi.wasj.2013.23.11.1119

Meng, X.-H., Huang, Y.-X., Rao, D.-P., Zhang, Q., & Liu, Q. (2013). Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. The Kaohsiung Journal of Medical Sciences, 29(2), 93–99. https://doi.org/10.1016/j.kjms.2012.08.016.

Okwechime, I. O., Roberson, S., & Odoi, A. (2015). Prevalence and predictors of pre-diabetes and diabetes among adults 18 years or older in Florida: a multinomial logistic modeling approach. PloS One, 10(12), 1–17. https://doi.org/10.1371/journal.pone.0145781

Steyerberg, E. W., Harrell Jr, F. E., Borsboom, G. J. J. M., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology, 54(8), 774–781.

Wah, Y. B. (2006). Some applications of data mining. National Statistics.

Wah, Y. B., Ismail, N. H., & Fong, S. (2011). Predicting car purchase intent using data mining approach. 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 3, 1994–1999.

World Health Organization. (2018). Diabetes. available at https://www.who

Downloads

Published

2022-06-30

How to Cite

Ibrahim, S., & Khairi, S. S. M. (2022). Predictive Data Mining Approaches for Diabetes Mellitus Type II Disease. International Journal of Global Optimization and Its Application, 1(2), 126–134. https://doi.org/10.56225/ijgoia.v1i2.22

Issue

Section

Articles
Abstract viewed = 273 times