Comparative Analysis of K-Means using MapReduce

Humam Siddiqui, Safdar Tanweer


Abstract: Continuous growth of digital data has led to concentration in the data mining technique .The actual purpose of data mining is to analyze the larger data set to extract knowledge and interesting patterns. The cluster analysis is an important data partitioning process which distribute data items into different groups (clusters), so that the data items in each cluster can share the common characteristics. Data collected in real time scenarios are more often semi structured and unstructured, that needs to be processed to extract hidden knowledge from it. Here the clustering technique comes into the scenes, there are various clustering algorithms, k Mean is the simplest and popular unsupervised learning algorithm ,which has solved many well know clustering problem. K Mean clustering algorithm produces a specific member of disjoint clusters , starting from randomly selected cluster centers. In this paper we have implemented the k mean clustering algorithm for different distance metrics in the MapReduce programming model running in Hadoop distributed environment.


K-Means; Data Mining; Big Data; MapReduce; Hadoop.

Full Text:




  • There are currently no refbacks.

Copyright (c) 2017 International Journal of Advanced Research in Computer Science