ENGLISH TO HINDI TRANSLITERATION SYSTEM USING COMBINATION-BASED APPROACH

Baljeet Kaur Dhindsa; Dharam Veer Sharma

doi:10.26483/ijarcs.v8i8.4801

PDF

Published: Oct 27, 2017

DOI: https://doi.org/10.26483/ijarcs.v8i8.4801

Keywords:

Transliteration, English-to-Hindi Transliteration, Combination-based Transliteration.

Baljeet Kaur Dhindsa

Guru Gobind Singh College for Women

Dharam Veer Sharma

Abstract

Transliteration plays a very significant role in machine translation, which has many applications such as cross-lingual information retrieval, communication, question-answering etc. The main objective of this research paper is to provide a method for transliteration of named entities from English to Hindi language. The proposed method consists of two modules, both of which apply phoneme-based approach to transliterate named entities. For transliteration, Module-I utilizes CMU Pronouncing dictionary, which is a collection of 133270 words along with their pronunciation. If the word to be transliterated is not found in CMU Pronouncing dictionary, Module-II is used. Module-II is based on 5-gram model, in which a maximum of five letters (two left, two right and one target letter) are used to generate transliterated target letter. The system has been tested on a database of 2408 North-Indian names. Google Input tool for Windows has been used for comparative study of the proposed transliteration system. The word accuracy of the transliteration system has been found to be 70.22% against 58.73% of Google Input tool.

Downloads

Download data is not yet available.

Issue

Vol. 8 No. 8 (2017): September-October

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

Author Biography

Baljeet Kaur Dhindsa, Guru Gobind Singh College for Women

Department of Computer Science and Applications and Assistant Professor

References

S. Karimi, F. Scholer, and A. Turpin, â€œMachine Transliteration Survey,â€ ACM Computing Survey, vol. 43(3), pp. 1-46, 2011.

S. Singh, English â€“ Hindi Translation Grammar, New Delhi, Prabhat Prakashan, 2010, pp. 69-81.

A. Kumaran, M. M. Khapra and P. Bhattacharyya, â€œCompositional Machine Transliteration,â€ ACM Journal on Transactions on Asian Language Information Processing (TALIP), vol. 9, no 4, pp. 1-29, 2010.

G. Nicolai, B. Hauer, M. Salameh, A. S. Arnaud, Y. Xu, L. Yao and G. Kondrak, â€œMultiple System Combination for Transliteration,â€ in Proceedings of the Fifth Named Entity Workshop, joint with 53rd ACL and the 7th IJCNLP Beijing, China, July 26-31, pp. 72â€“77, 2015.

S. Mathur and V.P. Saxena, "Hybrid Approach to English-Hindi Name Entity Transliteration," Electrical, Electronics and Computer Science (SCEECS), 2014 IEEE Students' Conference on March 1-2, 2014, pp.1-5, 2014.

A. Das, A. Ekbal, T. Mandal, and S. Bandyopadhyay, â€œEnglish to Hindi Machine Transliteration System at NEWS 2009,â€ in Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, 2009, pp. 80â€“83.

R. Haque, S. Dandapat, A. K. Srivastava,, S. K. Naskar, and A. Way, â€œEnglishâ€”Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009,â€ in Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, 2009, pp. 104â€“107.

D. Bhalla, N. Joshi and I. Mathur, â€œRule Based Transliteration Sscheme for English to Punjabi,â€ International Journal on Natural Language Computing (IJNLC), vol. 2, no.2, pp. 67-73, Apr 2013.

B. J. Kang and K. S. Choi, â€œAutomatic Transliteration and Back Transliteration by Decision Tree Learning,â€ in Proceedings of Conference on Language Resources and Evaluation. Athens, Greece, pp. 1135â€“1411, 2000.

The CMU Pronouncing Dictionary. [Online]. Available: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/. [Accessed: Jan 14, 2014].

Google Input Tool. [Online]. Available: https://www.google.com/inputtools/windows/. [Accessed: Feb 28, 2107].

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Baljeet Kaur Dhindsa, Guru Gobind Singh College for Women

References