Soumya Kanta Sarkar
Akash Nag


In this paper, we explore how the C4.5 algorithm can be applied to breast cancer datasets in order to extract and formulate rules for identifying risk factors. For this study, we have used the Wisconsin dataset containing 9 attributes related to various cell features and anomalies. We have then applied the C4.5 algorithm to that dataset to create a decision tree. From the inferred tree, the rules for identifying the patients at risk have been derived. With a training-set size of 200 patient records, our system was found to have an accuracy of 96.7%.


