An Enhanced and Pertinent Diagnostic System for Diabetes Mellitus Using Machine Learning
Keywords:
Diabetes Diagnosis, Machine Learning, Ensemble Learning, Feature Selection, Boruta Algorithm, Stacking Classifier, Explainable AI, PIMA Dataset, Random Forest, SMOTE.Abstract
Diabetes mellitus (DM) has emerged as one of the most significant public health challenges of the 21st century, affecting an estimated 463 million adults worldwide in 2019, with projections indicating a rise to 700 million by 2045 . As a chronic metabolic disorder characterized by elevated blood glucose levels, diabetes is associated with severe complications including cardiovascular disease, kidney failure, neuropathy, and vision loss, making early and accurate diagnosis critically important for effective intervention . This manuscript presents an enhanced and pertinent diagnostic system for diabetes that integrates advanced machine learning techniques to achieve superior prediction accuracy while maintaining clinical interpretability. The proposed methodology encompasses a comprehensive five-stage pipeline: (1) robust data preprocessing including outlier detection and handling of class imbalance; (2) Boruta-based feature selection to identify the most salient predictors; (3) K-Means++ clustering for data stratification; (4) Stacking ensemble learning combining multiple base classifiers; and (5) explainable AI frameworks (LIME and SHAP) for model transparency. Experimental validation on the PIMA Indian Diabetes Dataset (PIDD) demonstrates that the proposed stacking ensemble model achieves 98% accuracy, significantly outperforming single classifiers including Logistic Regression (77%), Random Forest (86.76%), and XGBoost . The Boruta-SMOTE-ENN-Tabu model further identifies critical risk factors including family history, age, central obesity, hyperlipidemia, and body mass index . Random Forest emerges as the most efficient individual technique, achieving the best accuracy among single classifiers . The integration of PSO-optimized weighted majority voting achieves 93.22% accuracy with 94.12% precision . This research contributes a clinically viable, interpretable, and high-performance diagnostic system capable of early diabetes detection, thereby enabling timely intervention and improved patient outcomes.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 INTESAB AALAMI

This work is licensed under a Creative Commons Attribution 4.0 International License.
