Document representation techniques and their effect on the document Clustering and Classification: A Review

Ksh Nareshkumar Singh, H. Mamata Devi, Anjana Kakoti Mahanta


Text data is the most common form of storing information. When engine search an query, user obtained the large collection of text data. All this retrieve text data are not relevant to the required information. So, it needs to organise the massive amount of text data. Analysing and processing the text data is mainly considered in text mining. Text mining uses the standard data mining methods- classification and clustering. These two methods are used to arrange the documents which are usually represented by hundreds or thousands of texts (words) data. Text data in the document can be represented in various representation methods. In this paper, we have presented a study of various research paper that explore the area of text mining including different document representation methods and their impact on clustering and classification results.


Text mining, Document representation, Clustering, Classification

Full Text:




  • There are currently no refbacks.

Copyright (c) 2017 International Journal of Advanced Research in Computer Science