An Efficient Selection of Initial Cluster in K-Means Using Entropy and Co-efficient of Variation for High Dimensional Data

Main Article Content

P. Gokila
Dr. P. Krishnakumari


K-means clustering is a method of cluster analysis which aims to partition n observations into k-clusters in which each observation belongs to the cluster with the nearest mean. For high dimensional dataset K-means can’t give better cluster output, feature selection methods are required to remove irrelevant features. But the proposed algorithm selects primary and secondary axes based on means and variations of individual column. Here entropy value is combined with mean and variation of axes selection. This may find the axes which are more relevant and highly ranked. The integration of these two techniques achieves feature selection and initial centroid selection simultaneously. Real-time high dimensional datasets are used for experiments to show that the proposed algorithm provides better results for high-dimensional dataset.

Keywords: K-means algorithm, initial cluster centers, high dimensional dataset, error percentage, entropy.


Download data is not yet available.

Article Details