Main Article Content

Prabhjot Kaur
Preetpal Kaur Buttar


Stemming is a method of deriving root word from the inflected word. The stemming process is often called conflation and is done by stemmers or stemming algorithms. The stemming algorithm is the process that reduces all the words of the same basis in a common form. The algorithm is basic building block for the stemmer. The development of stemmer is based on language and requires specific language knowledge and spell checking for that language. This paper, presents an overview of different stemming techniques and algorithms which have been used by the researchers for stemming in different languages.


Download data is not yet available.

Article Details



P. Rana, “Stemming of Punjabi Words By Using Brute Force Technique,†Int. J. Eng. Sci., vol. 3, no. 2, pp. 1351–1358, 2011.

V. Gupta and G. S. Lehal, “Punjabi language stemmer for nouns and proper names,†Proc. 2nd Work. South Southeast Asian Nat. Lang. Process. (WSSANLP), IJCNLP 2011, pp. 35–39, 2011.

J. B. Lovins, “Development of a stemming algorithm,†Mech. Transl. Comput. Linguist., vol. 11, no. June, pp. 22–31, 1968.

M. F. Porter, “An algorithm for suffix stripping,†Program, vol. 14, no. 3. pp. 130–137, 1980.

D. Kumar and P. Rana, “Design and Development of a Stemmer for Punjabi,†Int. J. Comput. Appl., vol. 11, no. 12, pp. 18–23, 2010.

Jasmeet Singh and V. Gupta, “Text Stemming: Approaches, Applications, and Challenges,†ACM Comput. Surv. Vol. 49, No. 3, Article 45 pp. 1-46, 2016.

J. Patel, P. Desai, and U. Bhagat, “A survey of different stemming algorithm,†Int. J. Adv. Eng. Res. Dev., vol. 2, no. 6, pp. 1083–1088, 2015.

Tom´aˇs Brychc´ın and Miloslav Konop´ık, “High precision stemmer,†Inf. Process. Manag. 51, 1, pp. 68–91, 2015.

Robert Krovetz, “Viewing morphology as an inference process,†In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202, 1993.

JiaulH. Paik, Mandar Mitra, Swapan K. Parui, and Kalervo Jarvelin, “An effective and efficient stemming algorithm for information retrieval,†ACM Trans. Inf. Syst. 29, 2011.

Jiaul H. Paik, Dipasree Pal, and Swapan K. Parui, “A novel corpus-based stemming algorithm using co-occurrence statistics,†In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, pp. 863–872, 2011.

Jiaul H. Paik, Swapan K. Parui, Dipasree Pal, and Stephen E. Robertson, “Effective and robust querybased stemming,†ACM Trans. Inf. Syst. 31, pp. 2013.

Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra, and Kalyankumar Datta, “Yet another suffix stripper,†ACM Trans. Inf. Syst. 25, 2007.

JiaulH. Paik and Swapan K. Parui, “A Fast corpus-based stemmer,†ACMTrans. Asian Lang. Inf. Process. 10, 2011.

David Weiss, “A hybrid stemmer for the Polish language,†Institute of Computing Science: Poznan University of Technology Research Report. 2005

Manish Shrivastava, Bibhuti Mohapatra, Pushpak Bhattacharyya, Nitin Agarwal, and Smriti Singh, “Morphology based natural language processing tools for indian languages,†In Proceedings of the 4th Annual Inter Research Student Seminar in Computer Science, 2005.

Giorgos Adam, Konstantinos Asimakis, Christos Bouras, and Vassilis Poulopoulos, “An efficient mechanism for stemming and tagging: the case of Greek language,†In Proceedings of the 14th International, 2010.

Pratikkumar Patel, Kashyap Popat, and Pushpak Bhattacharyya, “Hybrid stemmer for Gujarati,†In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 51, 2010.

Upendra Mishra and Chandra Prakash, “MAULIK: An effective stemmer for Hindi language†Int. J. Comput. Sci. Eng. 4, pp. 711–717, 2012.

C. D. Paice, “An Evaluation Method for Stemming Algorithmsâ€, Proceedings of 17th annual international ACM SIGIR conference on Research and development in

information retrieval, pp. 42-50, 1994.

X. Jinxi and C. Bruce W., “Corpus-based Stemming Using Co-occurrence of Word Variantsâ€, ACM Transactions on Information Systems, Volume 16, Issue 1, pp. 61-81, 1998.

J. Mayfield and P. McNamee, “Single N-gram stemmingâ€, Proceedings of the 26th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415-416, 2003.

M. Jenkins and D. Smith, “Conservative Stemming for Search and Indexingâ€, In Proceedings of SIGIR’05, 2005.

M. Massimo and O. Nicola. “A Novel Method for Stemmer Generation based on Hidden Markov Modelsâ€, Proceedings of the twelfth international conference on Information and knowledge management, pp. 131-138, 2003.

F. Peng, N. Ahmed, X. Li and Y. Lu, “Context Sensitive Stemming for Web Searchâ€, Proceedings of the 30th annual international ACM SIGIR Conference on Research

and Development in Information Retrieval, pp. 639-646.

A. Ramanathan and D. D. Rao, “A Lightweight Stemmer for Hindiâ€, Workshop on Computational Linguistics for South-Asian Languages, EACL, 2003.

S. Dasgupta and V. Ng, “Unsupervised Morphological Parsing of Bengaliâ€, Language Resources and Evaluation, 40(3-4):311-330, 2006.

Khan. 2007. “A light weight stemmer for Bengali and its Use in spelling Checker,†Proc. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA07), Irbid, Jordan, March 19-23.

Juhi Ameta, Nisheeth Joshi and Iti Mathur, 2011, “A Lightweight Stemmer for Gujarati,†46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section.

Vijay Sundar, “Morphological Analyzer for Classical Tamil Texts,†Workshop on Computational Linguistics for South-Asian Languages, 2012.

K. Suba, D. Jiandani and P. Bhattacharyya, “Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujaratiâ€, In proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, Chiang Mai, Thailand, pp.1-8, 2011