YerasiJ ayasimha Reddy, Yerasi Kedarnath Reddy, Rahul David, Rahul Vasishta, Anilkumar Ambore


Machine learning incorporates AI, and is used to solve many problems in data science. The machine reads patterns from existing databases, and then inserts them into an unknown database to predict the outcome. Classification can be a powerful machine learning method commonly used for prediction. Some classification algorithms provide satisfactory accuracy, while others provide restricted accuracy. This paper examines a method called ensemble classification, which is often used to improve the accuracy of weak algorithms by combining multiple categories. Tests for this tool are performed using a diabetic database. A comparative analytical approach was performed to find out how the ensemble process is often used to improve diabetes prognosis. The goal of this paper is not only to increase the accuracy of weak classification algorithms, but also to implement an algorithm on a medical database, to demonstrate its ability to detect the disease at an early age. The results of the study indicate that integrated strategies, such as the random forest, are effective in increasing the predictive accuracy of weak classifiers, and have shown satisfactory effectiveness in identifying the risks of diabetes. A seven-point increase in the accuracy of the weak classifiers was achieved with the help of an ensemble classification.



Machine Learning; Classification; Random Forest; Ensemble Classification; Weak Classifiers

Full Text:



Ayman Mir, Sudhir N. Dhage. (2018).Diabetes Disease Prediction using Machine Learning on Big Data of Healthcare. Naive Bayes, Support Vector Machine, Random Forest and Simple CART algorithm in WEKA to predict Diabetes. Random Forest turns out to be an accuracy of 78% over Naive Bayes, SVM and Simple CART.

V Mohan, R Deepa, M Deepa, S Somannavar, M Datta (2015).A Simplified Indian Diabetes Risk Score for Screening for Undiagnosed Diabetic Subjects. The Indian Diabetes Risk Score is developed based on results of many logistic regression analysis. Internal validation is performed on the identical data. IDRS has mainly four risk factors - abdominal obesity, family history of diabetes, age and physical activity.

Rajawat, P. S., Gupta, D. K., Rathore, S. S., & Singh, A. (2018). Predictive Analysis of Medical Data using a Hybrid Machine Learning Technique. Hybrid Machine learning approach to predict if a person is in risk of diabetes. Hybrid Technique turns out with an accuracy of 87.33% better than SVM,ANN,KNN.

Yahyaoui, A., Jamil, A., Rasheed, J., & Yesiltepe, M. (2019). A Decision Support System for Diabetes Prediction Using Machine Learning and Deep Learning Techniques. Machine learning algorithms (SVM,RF) and Deep Learning is based on algorithms which are used for predicting of diabetes. The results have showed that R F is more effective for the classification of diabetes which produced overall accuracy for diabetic prediction to be 80.67%.

Ramzan, M. (2016). Comparing and evaluating the performance of WEKA classifiers on critical diseases. Naive Bayes, Random Forest and J48 Decision Tree are the ones used to compare classifiers to predict critical diseases using the WEKA tool. Random forest however turns out with a higher accuracy which is more than both J48 and Naïve Bayes.

Ashwinkumar.U.M and Dr. Anandakumar K.R, "Predicting Early Detection of cardiac and Diabetes symptoms using Data mining techniques", International conference on computer Design and Engineering, vol.49, 2012.

DOI: https://doi.org/10.26483/ijarcs.v12i0.6737


  • There are currently no refbacks.

Copyright (c) 2021 International Journal of Advanced Research in Computer Science