REVIEW ON STEMMING TECHNIQUES

Prabhjot Kaur, Preetpal Kaur Buttar

Abstract


Stemming is a method of deriving root word from the inflected word. The stemming process is often called conflation and is done by stemmers or stemming algorithms. The stemming algorithm is the process that reduces all the words of the same basis in a common form. The algorithm is basic building block for the stemmer. The development of stemmer is based on language and requires specific language knowledge and spell checking for that language. This paper, presents an overview of different stemming techniques and algorithms which have been used by the researchers for stemming in different languages.


Keywords


Stemming; Stemming techniques; Survey

Full Text:

PDF

References


P. Rana, “Stemming of Punjabi Words By Using Brute Force Technique,” Int. J. Eng. Sci., vol. 3, no. 2, pp. 1351–1358, 2011.

V. Gupta and G. S. Lehal, “Punjabi language stemmer for nouns and proper names,” Proc. 2nd Work. South Southeast Asian Nat. Lang. Process. (WSSANLP), IJCNLP 2011, pp. 35–39, 2011.

J. B. Lovins, “Development of a stemming algorithm,” Mech. Transl. Comput. Linguist., vol. 11, no. June, pp. 22–31, 1968.

M. F. Porter, “An algorithm for suffix stripping,” Program, vol. 14, no. 3. pp. 130–137, 1980.

D. Kumar and P. Rana, “Design and Development of a Stemmer for Punjabi,” Int. J. Comput. Appl., vol. 11, no. 12, pp. 18–23, 2010.

Jasmeet Singh and V. Gupta, “Text Stemming: Approaches, Applications, and Challenges,” ACM Comput. Surv. Vol. 49, No. 3, Article 45 pp. 1-46, 2016.

J. Patel, P. Desai, and U. Bhagat, “A survey of different stemming algorithm,” Int. J. Adv. Eng. Res. Dev., vol. 2, no. 6, pp. 1083–1088, 2015.

Tom´aˇs Brychc´ın and Miloslav Konop´ık, “High precision stemmer,” Inf. Process. Manag. 51, 1, pp. 68–91, 2015.

Robert Krovetz, “Viewing morphology as an inference process,” In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202, 1993.

JiaulH. Paik, Mandar Mitra, Swapan K. Parui, and Kalervo Jarvelin, “An effective and efficient stemming algorithm for information retrieval,” ACM Trans. Inf. Syst. 29, 2011.

Jiaul H. Paik, Dipasree Pal, and Swapan K. Parui, “A novel corpus-based stemming algorithm using co-occurrence statistics,” In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, pp. 863–872, 2011.

Jiaul H. Paik, Swapan K. Parui, Dipasree Pal, and Stephen E. Robertson, “Effective and robust querybased stemming,” ACM Trans. Inf. Syst. 31, pp. 2013.

Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra, and Kalyankumar Datta, “Yet another suffix stripper,” ACM Trans. Inf. Syst. 25, 2007.

JiaulH. Paik and Swapan K. Parui, “A Fast corpus-based stemmer,” ACMTrans. Asian Lang. Inf. Process. 10, 2011.

David Weiss, “A hybrid stemmer for the Polish language,” Institute of Computing Science: Poznan University of Technology Research Report. 2005

Manish Shrivastava, Bibhuti Mohapatra, Pushpak Bhattacharyya, Nitin Agarwal, and Smriti Singh, “Morphology based natural language processing tools for indian languages,” In Proceedings of the 4th Annual Inter Research Student Seminar in Computer Science, 2005.

Giorgos Adam, Konstantinos Asimakis, Christos Bouras, and Vassilis Poulopoulos, “An efficient mechanism for stemming and tagging: the case of Greek language,” In Proceedings of the 14th International, 2010.

Pratikkumar Patel, Kashyap Popat, and Pushpak Bhattacharyya, “Hybrid stemmer for Gujarati,” In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 51, 2010.

Upendra Mishra and Chandra Prakash, “MAULIK: An effective stemmer for Hindi language” Int. J. Comput. Sci. Eng. 4, pp. 711–717, 2012.

C. D. Paice, “An Evaluation Method for Stemming Algorithms”, Proceedings of 17th annual international ACM SIGIR conference on Research and development in

information retrieval, pp. 42-50, 1994.

X. Jinxi and C. Bruce W., “Corpus-based Stemming Using Co-occurrence of Word Variants”, ACM Transactions on Information Systems, Volume 16, Issue 1, pp. 61-81, 1998.

J. Mayfield and P. McNamee, “Single N-gram stemming”, Proceedings of the 26th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415-416, 2003.

M. Jenkins and D. Smith, “Conservative Stemming for Search and Indexing”, In Proceedings of SIGIR’05, 2005.

M. Massimo and O. Nicola. “A Novel Method for Stemmer Generation based on Hidden Markov Models”, Proceedings of the twelfth international conference on Information and knowledge management, pp. 131-138, 2003.

F. Peng, N. Ahmed, X. Li and Y. Lu, “Context Sensitive Stemming for Web Search”, Proceedings of the 30th annual international ACM SIGIR Conference on Research

and Development in Information Retrieval, pp. 639-646.

A. Ramanathan and D. D. Rao, “A Lightweight Stemmer for Hindi”, Workshop on Computational Linguistics for South-Asian Languages, EACL, 2003.

S. Dasgupta and V. Ng, “Unsupervised Morphological Parsing of Bengali”, Language Resources and Evaluation, 40(3-4):311-330, 2006.

Khan. 2007. “A light weight stemmer for Bengali and its Use in spelling Checker,” Proc. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA07), Irbid, Jordan, March 19-23.

Juhi Ameta, Nisheeth Joshi and Iti Mathur, 2011, “A Lightweight Stemmer for Gujarati,” 46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section.

Vijay Sundar et.al, “Morphological Analyzer for Classical Tamil Texts,” Workshop on Computational Linguistics for South-Asian Languages, 2012.

K. Suba, D. Jiandani and P. Bhattacharyya, “Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati”, In proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, Chiang Mai, Thailand, pp.1-8, 2011




DOI: https://doi.org/10.26483/ijarcs.v9i5.6308

Refbacks

  • There are currently no refbacks.




Copyright (c) 2018 International Journal of Advanced Research in Computer Science