Main Article Content

Soumya Kanta Sarkar
Akash Nag


In this paper, we explore how the C4.5 algorithm can be applied to breast cancer datasets in order to extract and formulate rules for identifying risk factors. For this study, we have used the Wisconsin dataset containing 9 attributes related to various cell features and anomalies. We have then applied the C4.5 algorithm to that dataset to create a decision tree. From the inferred tree, the rules for identifying the patients at risk have been derived. With a training-set size of 200 patient records, our system was found to have an accuracy of 96.7%.


Download data is not yet available.

Article Details



Mouriquand, J., and D. Pasquier. "Fine needle aspiration of breast carcinoma: a preliminary cytoprognostic study." Acta cytologica 24.2 (1980): 153-159.

Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106

Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

Abbass, Hussein A. "An evolutionary artificial neural networks approach for breast cancer diagnosis." Artificial intelligence in Medicine 25.3 (2002): 265-281.

Ratanamahatana, Chotirat Ann, and Dimitrios Gunopulos. "Scaling up the naive Bayesian classifier: Using decision trees for feature selection." (2002).

Mangasarian, Olvi L., W. Nick Street, and William H. Wolberg. "Breast cancer diagnosis and prognosis via linear programming." Operations Research 43.4 (1995): 570-577.

Wolberg, William H., and Olvi L. Mangasarian. "Multisurface method of pattern separation for medical diagnosis applied to breast cytology." Proceedings of the national academy of sciences 87.23 (1990): 9193-9196.

Bennett, Kristin P., and Olvi L. Mangasarian. "Robust linear programming discrimination of two linearly inseparable sets." Optimization methods and software 1.1 (1992): 23-34.

Bennett, Kristin P., Ayhan Demiriz, and Richard Maclin. "Exploiting unlabeled data in ensemble methods." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.

Grąbczewski, Krzysztof, and Włodzisław Duch. "Heterogeneous forests of decision trees." International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, 2002.

Mangasarian, Olvi L., R. Setiono, and W. H. Wolberg. "Pattern recognition via linear programming: Theory and application to medical diagnosis." Large-scale numerical optimization (1990): 22-31.