RFKNN: ROUGH-FUZZY KNN FOR BIG DATA CLASSIFICATION

Mohamed A. Mahfouz

doi:10.26483/ijarcs.v9i2.5667

PDF

Published: Apr 20, 2018

DOI: https://doi.org/10.26483/ijarcs.v9i2.5667

Keywords:

classification, kNN, big data, clustering, fuzzy sets, rough sets

Mohamed A. Mahfouz

Ph.D., Faculty of Engineering, Alexandria University, Egypt

Abstract

The K-nearest neighbors (kNN) is a lazy-learning method for classification and regression that has been successfully applied to several application domains. It is simple and directly applicable to multi-class problems however it suffers a high complexity in terms of both memory and computations. Several research studies try to scale the kNN method to very large datasets using crisp partitioning. In this paper, we propose to integrate the principles of rough sets and fuzzy sets while conducting a clustering algorithm to separate the whole dataset into several parts, each of which is then conducted kNN classification. The concept of crisp lower bound and fuzzy boundary of a cluster which is applied to the proposed algorithm allows accurate selection of the set of data points to be involved in classifying an unseen data point. The data points to be used are a mix of core and border data points of the clusters created in the training phase. The experimental results on standard datasets show that the proposed kNN classification is more effective than related recent work with a slight increase in classification time.

Downloads

Download data is not yet available.

Issue

Vol. 9 No. 2 (2018): March-April 2018

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

Author Biography

Mohamed A. Mahfouz, Ph.D., Faculty of Engineering, Alexandria University, Egypt

Mohammed Mahfouz is a guest assistant professor in computer& communication engineering program, SSP, faculty of engineering, Alexandria University. He received the B.Sc., M.Sc. and PhD degrees in computer and Systems Engineering from the University of Alexandria, Egypt, in 1989 and 1996 and 2009 respectively. He has published several papers in the areas of bioinformatics and machine learning. Also, he is a recognized reviewer for Elsevier and reviewed several papers for other ranked journals.

References

W.-J. Hwang and K.-W. Wen, "Fast kNN classification algorithm based on partial distance search," Electronics letters, vol. 34, pp. 2062-2063, 1998.

Y. Song, J. Liang, J. Lu, and X. Zhao, "An efficient instance selection algorithm for k nearest neighbor regression," Neurocomputing, vol. 251, pp. 26-34, 2017.

R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine learning: An artificial intelligence approach: Springer Science & Business Media, 2013.

S. A. Medjahed, T. A. Saadi, and A. Benyettou, "Breast Cancer Diagnosis by using k-Nearest Neighbor with Different Distances and Classification Rules," International Journal of Computer Applications, vol. 62, 2013.

G. Bhattacharya, K. Ghosh, and A. S. Chowdhury, "An affinity-based new local distance function and similarity measure for kNN algorithm," Pattern Recognition Letters, vol. 33, pp. 356-363, 2012.

M. J. Islam, Q. J. Wu, M. Ahmadi, and M. A. Sid-Ahmed, "Investigating the performance of naive-bayes classifiers and k-nearest neighbor classifiers," in Convergence Information Technology, 2007. International Conference on, 2007, pp. 1541-1546.

T. Ä°nkaya, S. KayalÄ±gil, and N. E. Ã–zdemirel, "An adaptive neighbourhood construction algorithm based on density and connectivity," Pattern Recognition Letters, vol. 52, pp. 17-24, 2015.

S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, "Learning k for knn classification," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, p. 43, 2017.

I. Mani and I. Zhang, "kNN approach to unbalanced data distributions: a case study involving information extraction," in Proceedings of workshop on learning from imbalanced datasets, 2003.

V. Ganganwar, "An overview of classification algorithms for imbalanced datasets," International Journal of Emerging Technology and Advanced Engineering, vol. 2, pp. 42-47, 2012.

M.-L. Hou, S.-L. Wang, X.-L. Li, and Y.-K. Lei, "Neighborhood rough set reduction-based gene selection and prioritization for gene expression profile analysis and molecular cancer classification," BioMed Research International, vol. 2010, 2010.

O. Okun and H. Priisalu, "Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors," Artificial intelligence in medicine, vol. 45, pp. 151-162, 2009.

S. D. Bay, "Nearest neighbor classification from multiple feature subsets," Intelligent data analysis, vol. 3, pp. 191-209, 1999.

X. Wu, C. Zhang, and S. Zhang, "Efficient mining of both positive and negative association rules," ACM Transactions on Information Systems (TOIS), vol. 22, pp. 381-405, 2004.

X. Zhu, L. Zhang, and Z. Huang, "A sparse embedding and least variance encoding approach to hashing," IEEE transactions on image processing, vol. 23, pp. 3737-3750, 2014.

X. Zhu, S. Zhang, Z. Jin, Z. Zhang, and Z. Xu, "Missing value estimation for mixed-attribute data sets," IEEE Transactions on Knowledge and Data Engineering, vol. 23, pp. 110-121, 2011.

Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang, "Efficient kNN classification algorithm for big data," Neurocomputing, vol. 195, pp. 143-148, 2016.

Z. Pawlak and R. Sets, "Theoretical aspects of reasoning about data," Kluwer, Netherlands, 1991.

L. A. Zadeh, "Fuzzy sets," in Fuzzy Sets, Fuzzy Logic, And Fuzzy Systems: Selected Papers by Lotfi A Zadeh, ed: World Scientific, 1996, pp. 394-432.

A. K. Jain and R. C. Dubes, "Algorithms for clustering data," 1988.

R. J. Hathaway and J. C. Bezdek, "Extending fuzzy and probabilistic clustering to very large data sets," Computational Statistics & Data Analysis, vol. 51, pp. 215-234, 2006.

S. Z. Selim and M. A. Ismail, "Soft clustering of multidimensional data: a semi-fuzzy approach," Pattern Recognition, vol. 17, pp. 559-568, 1984.

"K. Bache,M.Lichman, UCIMach.Learn.Repos.(2013).", ed.

C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM transactions on intelligent systems and technology (TIST), vol. 2, p. 27, 2011.

G. Song, J. Rochas, F. Huet, and F. Magoules, "Solutions for processing k nearest neighbor joins for massive data on mapreduce," in Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, 2015, pp. 279-287.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Mohamed A. Mahfouz, Ph.D., Faculty of Engineering, Alexandria University, Egypt

References