WHEN DATA AUGMENTATION HURTS: A SYSTEMATIC EVALUATION OF SMOTE-BASED TECHNIQUES ON MEDICAL DATASETS

Main Article Content

May Stow

Abstract

Data augmentation techniques, particularly Synthetic Minority Over-sampling Technique (SMOTE) and its variants, are routinely applied to address class imbalance in medical datasets. However, the assumption that augmentation universally improves classification performance remains largely unvalidated. This study presents a systematic evaluation of four SMOTE-based augmentation methods across three medical datasets to determine when these techniques help or harm model performance. The research evaluated SMOTE, ADASYN, BorderlineSMOTE, and SVM-SMOTE on breast cancer diagnosis, heart disease prediction, and diabetes detection datasets, representing varying levels of class imbalance (ratios: 1.17 to 2.02) and baseline performance (F1 scores: 0.667 to 0.966). Random Forest classifiers were employed with both standard and regularized configurations to ensure robust findings. Each augmentation method underwent rigorous evaluation through 10 independent runs with statistical significance testing and effect size analysis. Results revealed that augmentation significantly degraded performance on the high-performing Breast Cancer dataset, with all methods showing statistically significant decreases (p < 0.05) and F1 scores dropping by up to 2.2%. Conversely, the Pima Diabetes dataset, characterized by lower baseline performance and higher imbalance, showed improvements up to 4.76% with SVM-SMOTE. Heart Disease exhibited mixed results, with only ADASYN achieving meaningful improvement. Analysis uncovered a strong negative correlation (r = -0.997) between baseline model performance and augmentation effectiveness, providing a more reliable predictor than traditional class imbalance ratios.


The study establishes an evidence-based decision framework: augmentation should be avoided when baseline F1 exceeds 0.95 or imbalance ratios fall below 1.5, considered for baseline F1 below 0.70 with imbalance ratios above 1.8, and carefully validated for intermediate cases. These findings challenge current practices of routine augmentation application and demonstrate that synthetic sample generation can blur decision boundaries in well-separated feature spaces. The research provides practitioners with validated guidelines for determining when augmentation techniques genuinely improve medical classifiers versus when they cause harm, ultimately supporting more effective development of clinical decision support systems.

Downloads

Download data is not yet available.

Article Details

Section
Articles