A Fuzzy Similarity Statistics Self-Constructing Clustering Algorithm for Text Classification with Geodesic Distance

Main Article Content

V. Karthika
Mrs. C. Rathika


In text classification, Feature clustering is a powerful method to reduce the dimensionality of feature vectors. In existing method a fuzzy similarity-based self-constructing feature clustering algorithm used for text classification. By this algorithm the derived membership functions match closely and describe properly the real distribution of the training data. The proposed method is to present a fuzzy statistical similarity measure instead of using the fuzzy set. The proposed method used fuzzy mean deviation and develops a fuzzy statistical similarity measure (FSS) in evaluating the similarity between the feature vectors. It can merge cluster centers to extract land-cover information by FSS. Fuzzy statistics is a subject based on the combination of fuzzy set theory and statistical methods. Fuzzy set theory is the basis in studying membership relationships from the fuzziness of the phenomena. The similarity is a negative value therefore a small value is equivalent to a large similarity. By introducing fuzzy mean deviation into similarity measure, it can exploit fuzzy sets in decision making. The FSS can take into account the difference of the same band between texts. The proposed method used the geodesic distance to measure distance between clusters. Geodesic distance is used to calculate a shortest path between the distance points for clustering methods. It reflects true embedded manifold, various cluster prototypes also can be used the distance measure. The proposed approaches are able to handle the data on a low dimensional manifold of the feature space. It does the clustering in the original feature space. The Proposed method shows the method can run better accuracy compared with the existing method.

Keywords:Fuzzy similarity statistics, feature clustering, feature extraction, feature reduction, text classification, Geodesic distance


Download data is not yet available.

Article Details