Sophisticated Fuzzy Clustering Algorithm for Duplicate content Detection based on Outlier Detection

Main Article Content

Nancy Jasmine Goldena
Dr. S.P. Victor


In this paper, the concept of duplicate document detection in the text is analyzed based on the fuzzy clustering method. It acts a method for allocating the data points in the documents as similar and dissimilar data through the cluster. It processes with series of stages to evaluate the algorithm. Initially, the collections of document with certain membership levels are compared. The suspicious text in the original document are matched with the list of other paragraphs that based on the fuzzy compilation of membership data. Then, it undergoes through the initial cluster generation based on set of documents. It evaluated by the local membership function through the modified fuzzy cluster algorithm. Finally the pattern are mapped through the outlier detection method in a iterative stages. The result of similar and dissimilar data are clustered and compared with various existing algorithms.

Key words – Fuzzy clustering, Duplicate document, text detection, outlier detection.


Download data is not yet available.

Article Details