Diabetes Risk Prediction using Logistic Regression Algorithm

Authors

  • Qatrunnada Refa Cahyani Universitas Diponegoro, Semarang, Indonesia
  • Mochammad Januar Finandi Universitas Nasional, Jakarta, Indonesia
  • Jathu Rianti Universitas Diponegoro, Semarang, Indonesia
  • Devi Lestari Arianti Politeknik Elektronika Negeri Surabaya, Indonesia
  • Arya Dwi Pratama Putra Universitas Nasional, Jakarta, Indonesia

DOI:

https://doi.org/10.55123/jomlai.v1i2.598

Keywords:

Diabetes, Logistic Regression, Recall, Confusion Matrix

Abstract

Many factors affect people suffering from diabetes, some of which are high blood pressure, excess sugar levels, weight, genetic history of diabetes, age, number of pregnancies, skin fold thickness, and the amount of insulin levels in the body. Logistic regression is a statistical tool that can be used in classification modeling about the presence or absence of diabetes. The aim of this study is to predict diagnostically whether a patient has diabetes or not. The results obtained are relatively low predictions because the ranges of values ​​of several factors that cause it are very far apart so normalization is carried out so that the ranges of values ​​are close together. The result is that diabetes risk prediction using a logistic regression algorithm with normalization resulted in a recall of 55% while without normalization it was 43%. Thus, normalization can improve the performance of diabetes risk prediction using a logistic regression algorithm. This model is expected to be a reference for the treatment of diabetics for doctors in hospitals and in the community to find out how to maintain a lifestyle and how to avoid diabetes in terms of the variables that affect the occurrence of the disease.

References

Y. Safitri and I. K. A. Nurhayati, “Pengaruh Pemberian Sari Pati Bengkuang (Pachyrhizus Erosus) terhadap Kadar Glukosa Darah pada Penderita Diabetes Mellitus Tipe II Usia 40-50 Tahun di Kelurahan Bangkinang Wilayah Kerja Puskesmas Bangkinang Kota Tahun 2018,” J. Ners, vol. 3, no. 1, pp. 69–81, 2019.

F. Fatmawati, “Perbandingan Algoritma Klasifikasi Data Mining Model C4. 5 dan Naive Bayes untuk Prediksi Penyakit Diabetes,” Techno Nusa Mandiri J. Comput. Inf. Technol., vol. 13, no. 1, pp. 50–59, 2016.

U. I. Lestari, A. Y. Nadhiroh, and C. Novia, “Penerapan Metode K-Nearest Neighbor untuk Sistem Pendukung Keputusan Identifikasi Penyakit Diabetes Melitus,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 8, no. 4, pp. 2071–2082, 2021.

F. Nasution, A. Andilala, and A. A. Siregar, “Faktor Risiko Kejadian Diabetes Mellitus,” J. Ilmu Kesehat., vol. 9, no. 2, pp. 94–102, 2021.

R. R. Santoso, “Implementasi Metode Machine Learning menggunakan Algoritma Evolving Artificial Neural Network pada Kasus Prediksi Diagnosis Diabetes.” Universitas Pendidikan Indonesia, 2020.

A. Roihan, P. A. Sunarya, and A. S. Rafika, “Pemanfaatan Machine Learning dalam Berbagai Bidang,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 5, no. 1, pp. 75–82, 2020.

F. K. Lembang, “Analisis Faktor Resiko Penyebab Diabetes Mellitus di Kota Ambon menggunakan Model Regresi Logistik,” Stat. J. Theor. Stat. Its Appl., vol. 15, no. 2, pp. 65–71, 2015.

M. A. Suhendra, D. Ispriyanti, and S. Sudarno, “Ketepatan Klasifikasi Pemberian Kartu Keluarga Sejahtera di Kota Semarang menggunakan Metode Regresi Logistik Biner dan Metode Chaid,” J. Gaussian, vol. 9, no. 1, pp. 64–74, 2020.

M. I. Gunawan, D. Sugiarto, and I. Mardianto, “Peningkatan Kinerja Akurasi Prediksi Penyakit Diabetes Mellitus Menggunakan Metode Grid Seacrh pada Algoritma Logistic Regression,” JEPIN (Jurnal Edukasi dan Penelit. Inform., vol. 6, no. 3, pp. 280–284, 2020.

M. Marna, M. Saftari, P. Jana, and M. Maxrizal, “Analisis Regresi Logistik Biner untuk memprediksi Faktor Internal dan Eksternal terhadap Indeks Prestasi,” Delta J. Ilm. Pendidik. Mat., vol. 9, no. 1, pp. 47–56, 2021.

Uci Machine Learning, “Pima Indians Diabetes Database,” kaggle, 2016. [Online], Tersedia: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database [Diakses: 22 April 2022].

N. A. Kurniawati and S. P. Rahayu, “Analisis Kadar CO, Titania, dan Suhu Terhadap Kelembaban Udara Menggunakan Preprocessing Data, Distribusi Normal Multivariat, Uji Bartlett, dan T2 Hotelling”.

Gde Agung Brahmana Suryanegara, Adiwijaya, and Mahendra Dwifebri Purbolaksono, “Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes menggunakan Metode Normalisasi,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 114–122, 2021, doi: 10.29207/resti.v5i1.2880.

scikit learn, “sklearn.linear_model.LogisticRegression — scikit-learn 1.1.1 documentation,” [Online], Tersedia: scikit learn.https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegressi [Diakses: 08 Juni 2022].

K. S. Nugroho, “Confusion Matrix untuk Evaluasi Model pada Supervised Learning,” Confusion Matrix untuk Eval. Model pada Supervised Learn., 2019.

S. Gargate, “Evaluating your classification model,” 13 Desember 2019. [Online]. Tersedia: https://medium.com/swlh/evaluating-your-classification-model-cb49338abb96 [Diakses: 12 Mei 2022].

R. Arthana, “Mengenal Accuracy, Precision, Recall dan Specificity serta yang diprioritaskan dalam Machine Learning,” 05 April 2019, [Online]. Tersedia: https://rey1024.medium.com/mengenal-accuracy-precission-recall-dan-specificity-serta-yang-diprioritaskan-b79ff4d77de8 [Diakses: 12 Mei 2022].

Downloads

Published

2022-07-28

How to Cite

Cahyani, Q. R., Finandi, M. J. ., Rianti, J., Arianti, D. L., & Putra, A. D. P. (2022). Diabetes Risk Prediction using Logistic Regression Algorithm. JOMLAI: Journal of Machine Learning and Artificial Intelligence, 1(2), 107–114. https://doi.org/10.55123/jomlai.v1i2.598

Issue

Section

Articles