Sakshi Siva Ramakrishna, ANURADHA TALASILA


Abstract: Data clustering refers to the partition of a dataset into homogeneous subsets where each subset is dissimilar to the rest of the subsets. K-means is a familiar approach for data clustering particularly when all the attributes of the data objects are of numeric type. Though the k-means approach is popular and efficient it is susceptible to misclassify the data due to the noise and outliers that are common in datasets. The aim of this paper is to study the strategies available to overcome the problems like high dimensionality, redundancy, noise and outliers while implementing the k-means algorithm and to propose a better approach to deal with the problem. An iterative attribute reduction procedure based on correlations among attributes was proposed to cluster the given dataset using k-means algorithm in an improved manner. The standard dataset “Iris” was used to test the proposed methodology. The obtained results are reasonably better.


clustering, Dimensionality reduction, Modified k-means, outliers, redundancy, Iris

Full Text:



. Jianpeng Qi, Yanwei Yu, LihongWang, Jinglei Liu and YingjieWang,“An effective and efficient hierarchical K-means clustering algorithm”, International Journal of Distributed Sensor Networks 2017, Vol. 13(8).

. Jiawei Han, Micheline Kamber and Jian Pei “Data Mining: Concepts and Techniques”, 3rd edition. The Morgan Kaufmann Series in Data Management Systems Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791.

. Kalpana D. Joshi et al, “Modified K-Means for Better Initial Cluster Centers “International Journal of Computer Science and Mobile Computing Vol.2 Issue. 7, July- 2013, pg. 219-223.

. Sohrab Mahmud Md, Mostafizer Rahman Md., Nasim Akhtar Md., “Improvement of K-means Clustering algorithm with better initial centroids based on weighted average”, 7th International Conference on Electrical and Computer Engineering, 2012, pp. 647-650.

. Vaishali Rajeev Patel, Rupa G. Mehta,“Performance Analysis of MK-means Clustering Algorithm with Normalization Approach”, World Congress on Information and Communication Technologies, 2011, pp. 974-979.

. Wang Shunye, “An improved k-means clustering algorithm based on dissimilarity”, Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC) Year: 2013 Pages: 2629 – 2633.

. Zhang Chen, Xia Shixiong, “K-means Clustering Algorithm with improved Initial Center”, Second International Workshop on Knowledge Discovery and Data Mining,2009, pp. 790-792.



  • There are currently no refbacks.

Copyright (c) 2018 International Journal of Advanced Research in Computer Science