Main Article Content

Manisha Goyal
Shruti Aggarwal


The main purpose of the process of data mining is to extract useful information from a huge amount of dataset. As one of the most important tasks in data mining, clustering is the process of grouping object attributes and features such that the data objects in one group are more similar than data objects in another group. It is a form of unsupervised learning that means how data should be grouped the data objects (similar types) together will be not known in advance. The algorithms used for clustering are k-means algorithm, k-medoid algorithm, k-nearest neighbour algorithm, k-mode algorithm etc. The K-Mode Algorithm is an eminent algorithm which is an extension of the K-Means Algorithm for clustering data set with categorical attributes and is famous for its simplicity and speed. The ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Mode’ of clusters is used instead of ‘Means’. In this paper, review on the K-Mode Algorithm is done.


Download data is not yet available.

Article Details



. Parneet Kaur, Manpreet Singh, Gurpreet Singh Josan, “Classification and prediction based data mining algorithms to predict slow learners in education sectorâ€, 3rd International Conference on Recent Trends in Computing, Elsevier, Vol. 57, pp. 500-508, 2015.

. Jeyhun Karimov, Murat Ozbayoglu, “Clustering Quality Improvement of k-means using a Hybrid Evolutionary Modelâ€, Conference Organized by Missouri University of Science and Technology, San Jose, Science Direct, Vol. 61, pp. 38-45, 2015.

. Rui Xu, “Survey of Clustering Algorithmsâ€, IEEE Transactions on Neural Networks, Vol. 16, pp. 645-678, May 2005.

. Han, J. and M. Kamber, “Data Mining: Concepts and Techniquesâ€, Morgan Kaufmann Publishers, 3rd Edition, India, 2011.

. Farhi Marir, Huwida Said, Feras Al-Obeidat, “Mining the Web and Literature to Discover New Knowledge about Diabetesâ€, The 3rd International Workshop on Machine Learning and Data Mining for Sensor Networks, Elsevier, Vol. 83, pp. 1256-1261, 2016.

. Preeti Arora, Deepali, Shipra Varshney, “Analysis of K-Means and K-Medoids Algorithm For Big Dataâ€, International Conference on Information Security & Privacy, India, Science Direct, Vol. 78, pp. 507-512, 2016.

. Feng Jiang, Guozhu Liu, Junwei Du, Yuefei Sui, “Initialization of K-modes clustering using outlier detection techniquesâ€, Information Sciences, Science Direct, Vol. 332, pp. 167-183, 2016.

. Z. Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Miningâ€, In proceeding SIGMOD workshop research issues on data mining and knowledge discovery, pp.1–8, 1997.

. Z. Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Valuesâ€, ACM Transaction on Data Mining and Knowledge Discovery, Vol. 2, pp. 283–304, 1998.

. Y. Sun, Q. Zhu, Z. Chen, “An iterative initial-points refinement algorithm for categorical data clusteringâ€, Pattern Recognition Letters, Elsevier, Vol. 23, Issue. 7, pp. 875–884, 2002.

. D. Barbara, J. Coute, Yi Li, “COOLCAT: An entropy based algorithm for categorical clusteringâ€, Proceedings of the eleventh international conference on Information and knowledge management, USA, ACM, pp. 582-589, 2002.

. F. Cao, J. Liang, L. Bai, “A new initialization method for categorical data clusteringâ€, Expert Systems with Applications, Science Direct, Vol. 36, pp. 10223-10228, 2009.

. S. S. Khan, A. Ahmad, “Cluster Center Initialization for Categorical Data Using Multiple Attribute Clusteringâ€, Expert Systems with Applications, Elsevier, Vol. 40, pp. 7444–7456, 2013.

. R. S. Sangam, H. Om, “The k-modes algorithm with entropy based similarity coefficientâ€, 2nd International Symposium on Big Data and Cloud Computing, Procedia Computer Science, Elsevier, Vol. 50, pp. 93-98, 2015.

. Z. He, S. Deng, X. Xu, “Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Modeâ€, Computational Intelligence and Security, Springer, pp. 157-162, 2005.

. Amir Ahmad, Lipika Dey, “A K-Mean Clustering Algorithm for Mixed Numeric and Categorical Dataâ€, Data & Knowledge Engineering, Science Direct, Vol. 63, pp. 503–527, 2007.

. Amir Ahmad, Lipika Dey, “A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data setâ€, Pattern Recognition Letters, Science Direct, Vol. 28, Issue. 1, pp. 110–118, 2007.

. M. K. Ng, M. J. Li, J. Z. Huang, “On the Impact of Dissimilarity Measure in K-Modes Clustering Algorithmâ€, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue. 3, pp. 503-507, 2007.

. J. Lee, Y. J. Lee, M. Park, “Clustering with Domain Value Dissimilarity for Categorical Dataâ€, Advances in Data Mining, Applications and Theoretical Aspects, Lecture Notes in Computer Science, Springer, Vol. 5633, pp. 310-324, 2009.

. D. Ienco, R. G. Pensa, R. Meo, “From Context to Distance: Learning Dissimilarity for Categorical Data Clusteringâ€, ACM Transactions on Knowledge Discovery from Data, pp.1-22, 2011.

. A. Desai, H. Singh, V. Pudi, “DISC: Data Intensive Similarity Measure for Categorical Dataâ€, Proceedings of Advances in Knowledge Discovery and Data Mining – 15th Pacific Asia Conference, Springer, pp. 469 – 481, 2011.

. F. Cao, J. Liang, D. Li, L. Bai, C. Dang, “A dissimilarity measure for the k-modes clustering algorithmâ€, Knowledge-Based Systems, Elsevier, Vol. 26, pp. 120–127, 2012.

. O. M. San, V. Hyunh, Y. Nakamori, “An Alternative Extension of the k-Means Algorithm for Clustering Categorical Dataâ€. International Journal Applied Math and Computer Science, Vol.14, pp. 241–247, 2004.

. Y. M. Cheung, H. Jia, “Categorical and numerical attribute data clustering based on a unified similarity metric without knowing cluster numberâ€, Pattern Recognition, Elsevier, Vol. 46, pp. 2228–2238, 2013.