Logistic Regression's Effectiveness in Feature Selection with Information Gain in Predicting Heart Failure Patients

  • Mochammad Anshori Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW
  • M. Syauqi Haris Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW
  • Arif Wahyudi Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW
Keywords: Heart Failures, Information Gain, Feature Selection, Prediction, Logistic Regression

Abstract

Heart failure is a chronic illness that obstructs blood flow, which is necessary for the body to circulate oxygen. Patients with heart failure have a poor chance of survival, as evidenced by the high death rate. The hospital's infrastructure and medical facilities determine the degree of patient safety, and the patients' medical records play a significant role in ensuring that they receive the right care. As a result, a system that uses specific data to forecast the safety of heart failure patients is required. Machine learning, a computer-based approach, is one way to get around this. The logistic regression algorithm has been used to generate predictions in earlier studies. The approach for feature selection from the dataset that is suggested in this study is information gain. You can filter features that are significant to the dataset in this way. In addition, selection can enhance machine learning efficacy by decreasing the dimensions of the data. Five features—time, serum creatinine, ejection fraction, age, and serum sodium—are the outcome of information gain. After that, predictions were made using logistic regression, and a data sharing ratio of 70% training data and 30% test data resulted in an accuracy of 0.8556. This demonstrates how feature selection with Information Gain can improve the accuracy of the logistic regression model and is a very effective method.

References

[1] D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Med. Inform. Decis. Mak., vol. 20, no. 1, pp. 1–16, 2020, doi: 10.1186/s12911-020-1023-5.
[2] P. R. Degregory, J. Tapia, T. Wong, J. Villa, I. Richards, and R. M. Crooks, “Managing Heart Failure at Home with Point-of-Care Diagnostics,” IEEE J. Transl. Eng. Heal. Med., vol. 5, no. August, pp. 1–6, 2017, doi: 10.1109/JTEHM.2017.2740920.
[3] M. Gjoreski, A. Gradisek, B. Budna, M. Gams, and G. Poglajen, “Machine Learning and End-to-End Deep Learning for the Detection of Chronic Heart Failure from Heart Sounds,” IEEE Access, vol. 8, pp. 20313–20324, 2020, doi: 10.1109/ACCESS.2020.2968900.
[4] F. Miao, Y. P. Cai, Y. X. Zhang, X. M. Fan, and Y. Li, “Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest,” IEEE Access, vol. 6, pp. 7244–7253, 2018, doi: 10.1109/ACCESS.2018.2789898.
[5] B. Wang et al., “A Multi-Task Neural Network Architecture for Renal Dysfunction Prediction in Heart Failure Patients with Electronic Health Records,” IEEE Access, vol. 7, pp. 178392–178400, 2019, doi: 10.1109/ACCESS.2019.2956859.
[6] D. Derisma, “Perbandingan Kinerja Algoritma untuk Prediksi Penyakit Jantung dengan Teknik Data Mining,” J. Appl. Informatics Comput., vol. 4, no. 1, pp. 84–88, 2020, doi: 10.30871/jaic.v4i1.2152.
[7] G. G. N. Geweid and M. A. Abdallah, “A new automatic identification method of heart failure using improved support vector machine based on duality optimization technique,” IEEE Access, vol. 7, pp. 149595–149611, 2019, doi: 10.1109/ACCESS.2019.2945527.
[8] A. Harris and A. E. Mintaria, “Komparasi Information Gain , Gain Ratio , CFs-Bestfirst dan CFs-PSO Search Terhadap Performa Deteksi Anomali,” vol. 5, pp. 332–343, 2021, doi: 10.30865/mib.v5i1.2258.
[9] I. made B. Adnyana, “Penerapan Feature Selection untuk Prediksi Lama Studi Mahasiswa,” J. Sist. Dan Inform., vol. 13, pp. 72–76, 2019.
[10] S. J. Pasha and E. S. Mohamed, “Ensemble Gain Ratio Feature Selection (EGFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction,” Proc. 5th Int. Conf. Inven. Comput. Technol. ICICT 2020, pp. 590–596, 2020, doi: 10.1109/ICICT48043.2020.9112406.
[11] A. Ridok, N. Widodo, W. F. Mahmudy, and M. Rifa’i, “A hybrid feature selection on AIRS method for identifying breast cancer diseases,” Int. J. Electr. Comput. Eng., vol. 11, no. 1, pp. 728–735, 2021, doi: 10.11591/ijece.v11i1.pp728-735.
[12] I. Maulida, A. Suyatno, H. Rahmania Hatta, and U. Mulawarman, “Seleksi Fitur Pada Dokumen Abstrak Teks Bahasa Indonesia Menggunakan Metode Information Gain,” JSM STMIK Mikroskil, vol. 17, no. 2, pp. 249–258, 2016.
[13] D. A. Bimantoro and S. ’ Uyun, “Pengaruh Penggunaan Information Gainuntuk Seleksi Fitur Citra Tanah Dalam Rangka Menilai Kesesuaian Lahan Pada Tanaman Cengkeh,” Jiska, vol. 2, no. 1, pp. 42–52, 2017.
[14] A. A. Syafitri Hidayatul AA, Yuita Arum S, “Seleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan Naïve Bayes,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 9, pp. 2546–2554, 2018.
[15] A. B. P. Negara, H. Muhardi, and I. M. Putri, “Analisis Sentimen Maskapai Penerbangan Menggunakan Metode Naive Bayes dan Seleksi Fitur Information Gain,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 3, p. 599, 2020, doi: 10.25126/jtiik.2020711947.
[16] M. Anshori, F. Mar’i, and F. A. Bachtiar, “Comparison of Machine Learning Methods for Android Malicious Software Classification based on System Call,” Proc. 2019 4th Int. Conf. Sustain. Inf. Eng. Technol. SIET 2019, pp. 343–348, 2019, doi: 10.1109/SIET48054.2019.8985998.
[17] W. Książek, M. Gandor, and P. Pławiak, “Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma,” Comput. Biol. Med., vol. 134, p. 104431, 2021, doi: 10.1016/j.compbiomed.2021.104431.
Published
2024-07-31
How to Cite
Anshori, M., Haris, M. S., & Wahyudi, A. (2024). Logistic Regression’s Effectiveness in Feature Selection with Information Gain in Predicting Heart Failure Patients. Journal of Enhanced Studies in Informatics and Computer Applications, 1(2), 35-39. https://doi.org/10.47794/jesica.v1i2.8