An Overview of VSM-Based Text Clustering Approaches

Francis Musembi Kwale


Huge digital data is typical nowadays. Consequently, text clustering has become a crucial text mining technique nowadays. Organizing search information into groups (or clusters) not only makes the search results more meaningful, but makes the system more efficient since the user views only the relevant information and ignores the rest. An example is clustering web search engine result into meaningful results. Many text clustering approaches and their corresponding algorithms exist, but none has been found to be sufficient. There is also insufficient understanding of the algorithms as well as lack of agreed formal classification of the algorithms. There is thus, need for an in-depth study of the various algorithms. In this paper, we describe the Vector Space Model (VSM) method of text representation. We also give an overview of the text clustering approaches that apply the VSM .These include distance based approach, feature extraction approach, density-based approach, grid-based approach, and neural networks approach. We describe the characteristics of each accompanied by a representative algorithm. The paper thus informs researchers of text mining concerning the current state of affairs of text clustering algorithm.

Keywords: clusters, text clustering, text mining, vector space model.

Full Text:




  • There are currently no refbacks.

Copyright (c) 2016 International Journal of Advanced Research in Computer Science