A Fuzzy Similarity Statistics Self-Constructing Clustering Algorithm for Text Classification with Geodesic Distance
Main Article Content
Abstract
In text classification, Feature clustering is a powerful method to reduce the dimensionality of feature vectors. In existing method a fuzzy similarity-based self-constructing feature clustering algorithm used for text classification. By this algorithm the derived membership functions match closely and describe properly the real distribution of the training data. The proposed method is to present a fuzzy statistical similarity measure instead of using the fuzzy set. The proposed method used fuzzy mean deviation and develops a fuzzy statistical similarity measure (FSS) in evaluating the similarity between the feature vectors. It can merge cluster centers to extract land-cover information by FSS. Fuzzy statistics is a subject based on the combination of fuzzy set theory and statistical methods. Fuzzy set theory is the basis in studying membership relationships from the fuzziness of the phenomena. The similarity is a negative value therefore a small value is equivalent to a large similarity. By introducing fuzzy mean deviation into similarity measure, it can exploit fuzzy sets in decision making. The FSS can take into account the difference of the same band between texts. The proposed method used the geodesic distance to measure distance between clusters. Geodesic distance is used to calculate a shortest path between the distance points for clustering methods. It reflects true embedded manifold, various cluster prototypes also can be used the distance measure. The proposed approaches are able to handle the data on a low dimensional manifold of the feature space. It does the clustering in the original feature space. The Proposed method shows the method can run better accuracy compared with the existing method.
Keywords:Fuzzy similarity statistics, feature clustering, feature extraction, feature reduction, text classification, Geodesic distance
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.