Manisha Goyal, Shruti Aggarwal


The main purpose of the process of data mining is to extract useful information from a huge amount of dataset. As one of the most important tasks in data mining, clustering is the process of grouping object attributes and features such that the data objects in one group are more similar than data objects in another group. It is a form of unsupervised learning that means how data should be grouped the data objects (similar types) together will be not known in advance. The algorithms used for clustering are k-means algorithm, k-medoid algorithm, k-nearest neighbour algorithm, k-mode algorithm etc. The K-Mode Algorithm is an eminent algorithm which is an extension of the K-Means Algorithm for clustering data set with categorical attributes and is famous for its simplicity and speed. The ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Mode’ of clusters is used instead of ‘Means’. In this paper, review on the K-Mode Algorithm is done.


Data Mining; Clustering; K-Means Algorithm; K-Mode Algorithm

Full Text:



. Parneet Kaur, Manpreet Singh, Gurpreet Singh Josan, “Classification and prediction based data mining algorithms to predict slow learners in education sector”, 3rd International Conference on Recent Trends in Computing, Elsevier, Vol. 57, pp. 500-508, 2015.

. Jeyhun Karimov, Murat Ozbayoglu, “Clustering Quality Improvement of k-means using a Hybrid Evolutionary Model”, Conference Organized by Missouri University of Science and Technology, San Jose, Science Direct, Vol. 61, pp. 38-45, 2015.

. Rui Xu, “Survey of Clustering Algorithms”, IEEE Transactions on Neural Networks, Vol. 16, pp. 645-678, May 2005.

. Han, J. and M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 3rd Edition, India, 2011.

. Farhi Marir, Huwida Said, Feras Al-Obeidat, “Mining the Web and Literature to Discover New Knowledge about Diabetes”, The 3rd International Workshop on Machine Learning and Data Mining for Sensor Networks, Elsevier, Vol. 83, pp. 1256-1261, 2016.

. Preeti Arora, Deepali, Shipra Varshney, “Analysis of K-Means and K-Medoids Algorithm For Big Data”, International Conference on Information Security & Privacy, India, Science Direct, Vol. 78, pp. 507-512, 2016.

. Feng Jiang, Guozhu Liu, Junwei Du, Yuefei Sui, “Initialization of K-modes clustering using outlier detection techniques”, Information Sciences, Science Direct, Vol. 332, pp. 167-183, 2016.

. Z. Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining”, In proceeding SIGMOD workshop research issues on data mining and knowledge discovery, pp.1–8, 1997.

. Z. Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, ACM Transaction on Data Mining and Knowledge Discovery, Vol. 2, pp. 283–304, 1998.

. Y. Sun, Q. Zhu, Z. Chen, “An iterative initial-points refinement algorithm for categorical data clustering”, Pattern Recognition Letters, Elsevier, Vol. 23, Issue. 7, pp. 875–884, 2002.

. D. Barbara, J. Coute, Yi Li, “COOLCAT: An entropy based algorithm for categorical clustering”, Proceedings of the eleventh international conference on Information and knowledge management, USA, ACM, pp. 582-589, 2002.

. F. Cao, J. Liang, L. Bai, “A new initialization method for categorical data clustering”, Expert Systems with Applications, Science Direct, Vol. 36, pp. 10223-10228, 2009.

. S. S. Khan, A. Ahmad, “Cluster Center Initialization for Categorical Data Using Multiple Attribute Clustering”, Expert Systems with Applications, Elsevier, Vol. 40, pp. 7444–7456, 2013.

. R. S. Sangam, H. Om, “The k-modes algorithm with entropy based similarity coefficient”, 2nd International Symposium on Big Data and Cloud Computing, Procedia Computer Science, Elsevier, Vol. 50, pp. 93-98, 2015.

. Z. He, S. Deng, X. Xu, “Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode”, Computational Intelligence and Security, Springer, pp. 157-162, 2005.

. Amir Ahmad, Lipika Dey, “A K-Mean Clustering Algorithm for Mixed Numeric and Categorical Data”, Data & Knowledge Engineering, Science Direct, Vol. 63, pp. 503–527, 2007.

. Amir Ahmad, Lipika Dey, “A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set”, Pattern Recognition Letters, Science Direct, Vol. 28, Issue. 1, pp. 110–118, 2007.

. M. K. Ng, M. J. Li, J. Z. Huang, “On the Impact of Dissimilarity Measure in K-Modes Clustering Algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue. 3, pp. 503-507, 2007.

. J. Lee, Y. J. Lee, M. Park, “Clustering with Domain Value Dissimilarity for Categorical Data”, Advances in Data Mining, Applications and Theoretical Aspects, Lecture Notes in Computer Science, Springer, Vol. 5633, pp. 310-324, 2009.

. D. Ienco, R. G. Pensa, R. Meo, “From Context to Distance: Learning Dissimilarity for Categorical Data Clustering”, ACM Transactions on Knowledge Discovery from Data, pp.1-22, 2011.

. A. Desai, H. Singh, V. Pudi, “DISC: Data Intensive Similarity Measure for Categorical Data”, Proceedings of Advances in Knowledge Discovery and Data Mining – 15th Pacific Asia Conference, Springer, pp. 469 – 481, 2011.

. F. Cao, J. Liang, D. Li, L. Bai, C. Dang, “A dissimilarity measure for the k-modes clustering algorithm”, Knowledge-Based Systems, Elsevier, Vol. 26, pp. 120–127, 2012.

. O. M. San, V. Hyunh, Y. Nakamori, “An Alternative Extension of the k-Means Algorithm for Clustering Categorical Data”. International Journal Applied Math and Computer Science, Vol.14, pp. 241–247, 2004.

. Y. M. Cheung, H. Jia, “Categorical and numerical attribute data clustering based on a unified similarity metric without knowing cluster number”, Pattern Recognition, Elsevier, Vol. 46, pp. 2228–2238, 2013.

DOI: https://doi.org/10.26483/ijarcs.v8i7.4301


  • There are currently no refbacks.

Copyright (c) 2017 International Journal of Advanced Research in Computer Science