An Overview of VSM-Based Text Clustering Approaches

Main Article Content

Francis Musembi Kwale

Abstract

Huge digital data is typical nowadays. Consequently, text clustering has become a crucial text mining technique nowadays. Organizing search information into groups (or clusters) not only makes the search results more meaningful, but makes the system more efficient since the user views only the relevant information and ignores the rest. An example is clustering web search engine result into meaningful results. Many text clustering approaches and their corresponding algorithms exist, but none has been found to be sufficient. There is also insufficient understanding of the algorithms as well as lack of agreed formal classification of the algorithms. There is thus, need for an in-depth study of the various algorithms. In this paper, we describe the Vector Space Model (VSM) method of text representation. We also give an overview of the text clustering approaches that apply the VSM .These include distance based approach, feature extraction approach, density-based approach, grid-based approach, and neural networks approach. We describe the characteristics of each accompanied by a representative algorithm. The paper thus informs researchers of text mining concerning the current state of affairs of text clustering algorithm.


Keywords: clusters, text clustering, text mining, vector space model.

Downloads

Download data is not yet available.

Article Details

Section
Articles