Detection and Recognition of Hindi Text from Natural Scenes and its Transliteration to English

Preetpal Kaur Buttar

doi:10.26483/ijarcs.v13i2.6808

PDF

Published: Apr 20, 2022

DOI: https://doi.org/10.26483/ijarcs.v13i2.6808

Keywords:

Text detection, text recognition, transliteration, scene text, LSTM, neural machine translation

Preetpal Kaur Buttar

Sant Longowal Institute of Engineering and Technology, Longowal

Abstract

India is a country with many cultures and if you travel from one place to another, you might find yourself in totally different culture. This also means the languages change from place to place in India and it gets very difficult to read signboards, shop names and even many other common things written in local languages. This can create problems for not only the travelers travelling from other countries but also the people who move withing the country from different regions. But most of the signboards, shop names or other landmarks mostly use English or Hindi in most of the regions. Here we propose a complete text detection & recognition as well as transliteration system that will help travelers read text written in Hindi on any signboards or shops and then transliterate that detected text into English. The proposed system is capable of detecting text written in Hindi language in natural environment using Progressive Scale Expansion algorithm and then transliterating the detected text into English language. Our proposed system can detect text in tough scenarios, and it can even detect curved text from natural images. Our system after detecting text region, extracts the text from the detected area using PyTesseract OCR engine and then the extracted text is further transliterated into English text with the help of seq2seq MultiRNN LSTM model which gives us accurate transliterations without losing the actual pronunciation of the original Hindi words. We use a synthetic dataset for Hindi Text images containing approx. 100000 for Text Detection and FIRE2013 dataset for transliteration. The overall system is evaluated using BLEU score.

Downloads

Download data is not yet available.

Issue

Vol. 13 No. 2 (2022): March-April 2022

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

B. Epshtein, E. Ofek, and Y. Wexler, â€œDetecting text in natural scenes with stroke width transform,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 2963â€“2970, 2010, doi: 10.1109/CVPR.2010.5540041.

S. Bhargava and E. Yablonovitch, â€œLowering HAMR near-field transducer temperature via inverse electromagnetic design,â€ IEEE Trans. Magn., vol. 51, no. 4, 2015, doi: 10.1109/TMAG.2014.2355215.

S. Karim, A. A. Laghari, A. Halepoto, A. Manzoor, N. Hussain Phulpoto, and A. Ali, â€œVehicle detection in Satellite Imagery using Maximally Stable Extremal Regions,â€ IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 18, no. 4, pp. 75â€“78, 2018.

I. Ahmad and G. A. Fink, â€œHandwritten Arabic text recognition using multi-stage sub-core-shape HMMs,â€ Int. J. Doc. Anal. Recognit., vol. 22, no. 3, pp. 329â€“349, 2019, doi: 10.1007/s10032-019-00339-8.

X. Zhou et al., â€œEast: An efficient and accurate scene text detector,â€ arXiv, pp. 5551â€“5560, 2017.

J. Wang and X. Hu, â€œGated Recurrent Convolution Neural Network for OCR,â€ no. Nips, 2017.

P. Shivakumara, D. Tang, M. Asadzadehkaljahi, T. Lu, U. Pal, and M. H. Anisi, â€œCNN-RNN based method for license plate recognition,â€ CAAI Trans. Intell. Technol., vol. 3, no. 3, pp. 169â€“175, 2018, doi: 10.1049/trit.2018.1015.

L. Giridhar, A. Dharani, and V. Guruviah, â€œA novel approach to OCR using image recognition based classification for ancient tamil inscriptions in temples,â€ arXiv, pp. 1â€“8, 2019.

S. Prajapati, S. R. Joshi, A. Maharjan, and B. Balami, â€œEvaluating Performance of Nepali Script OCR using Tesseract and Artificial Neural Network,â€ Proc. 2018 IEEE 3rd Int. Conf. Comput. Commun. Secur. ICCCS 2018, pp. 104â€“107, 2018, doi: 10.1109/CCCS.2018.8586808.

A. S., J. Yankey, and E. O., â€œAn Automatic Number Plate Recognition System using OpenCV and Tesseract OCR Engine,â€ Int. J. Comput. Appl., vol. 180, no. 43, pp. 1â€“5, 2018, doi: 10.5120/ijca2018917150.

P. Duygulu, K. Barnard, J. F. G. de Freitas, and D. A. Forsyth, â€œObject recognition as machine translation: Learning a lexicon for a fixed image vocabulary,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2353, pp. 97â€“112, 2002, doi: 10.1007/3-540-47979-1_7.

T. Deselaers, S. Hasan, O. Bender, and H. Ney, â€œA deep learning approach to machine transliteration,â€ no. March, p. 233, 2009, doi: 10.3115/1626431.1626476.

M. Alam and S. ul Hussain, â€œSequence to sequence networks for roman-Urdu to Urdu transliteration,â€ arXiv, pp. 1â€“7, 2017.

Y. Wu et al., â€œGoogleâ€™s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,â€ arXiv e-prints, p. arXiv:1609.08144, 2016, [Online]. Available: http://arxiv.org/abs/1609.08144.

T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan, â€œRecognizing text with perspective distortion in natural scenes,â€ Proc. IEEE Int. Conf. Comput. Vis., pp. 569â€“576, 2013, doi: 10.1109/ICCV.2013.76.

P. Dollar, R. Appel, S. Belongie, and P. Perona, â€œFast feature pyramids for object detection,â€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532â€“1545, 2014, doi: 10.1109/TPAMI.2014.2300479.

K. He, X. Zhang, S. Ren, and J. Sun, â€œDeep residual learning for image recognition,â€ 2016, doi: 10.1109/CVPR.2016.90.

M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, â€œTextBoxes: A fast text detector with a single deep neural network,â€ 31st AAAI Conf. Artif. Intell. AAAI 2017, pp. 4161â€“4167, 2017.

Y. Zhu and J. Du, â€œSliding line point regression for shape robust scene text detection,â€ arXiv, pp. 3735â€“3740, 2018.

S. R. Laskar, A. Dutta, P. Pakray, and S. Bandyopadhyay, â€œNeural machine translation: English to hindi,â€ 2019 IEEE Conf. Inf. Commun. Technol. CICT 2019, pp. 25â€“30, 2019, doi: 10.1109/CICT48419.2019.9066238.

W. Wang et al., â€œShape robust text detection with progressive scale expansion network,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, no. c, pp. 9328â€“9337, 2019, doi: 10.1109/CVPR.2019.00956.

T.-Y. Lin, P. DollÃ¡r, R. Girshick, K. He, B. Hariharan, and S. Belongie, â€œFeature Pyramid Networks for Object Detection,â€ Proc. - 2019 IEEE Intl Conf Parallel Distrib. Process. with Appl. Big Data Cloud Comput. Sustain. Comput. Commun. Soc. Comput. Networking, ISPA/BDCloud/SustainCom/SocialCom 2019, pp. 1500â€“1504, Dec. 2016, doi: 10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00217.

F. Milletari, N. Navab, and S. A. Ahmadi, â€œV-Net: Fully convolutional neural networks for volumetric medical image segmentation,â€ Proc. - 2016 4th Int. Conf. 3D Vision, 3DV 2016, pp. 565â€“571, 2016, doi: 10.1109/3DV.2016.79.

A. Shrivastava, A. Gupta, and R. Girshick, â€œTraining region-based object detectors with online hard example mining,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 761â€“769, 2016, doi: 10.1109/CVPR.2016.89.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, â€œGradient-based learning applied to document recognition,â€ Proc. IEEE, vol. 86, no. 11, pp. 2278â€“2323, 1998, doi: 10.1109/5.726791.

S. Ioffe and C. Szegedy, â€œBatch normalization: Accelerating deep network training by reducing internal covariate shift,â€ 32nd Int. Conf. Mach. Learn. ICML 2015, vol. 1, pp. 448â€“456, 2015.

X. Glorot, A. Bordes, and Y. Bengio, â€œDeep sparse rectifier neural networks,â€ J. Mach. Learn. Res., vol. 15, pp. 315â€“323, 2011.

B. Leibe, J. Matas, N. Sebe, and M. Welling, â€œPreface,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9906 LNCS, pp. VIIâ€“IX, 2016, doi: 10.1007/978-3-319-46493-0.

Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, â€œImageNet: A large-scale hierarchical image database,â€ pp. 248â€“255, 2009, doi: 10.1109/cvprw.2009.5206848.

K. He, X. Zhang, S. Ren, and J. Sun, â€œDelving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,â€ in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026â€“1034, doi: 10.1109/ICCV.2015.123.

A. Khan and A. Sarfaraz, â€œRNN-LSTM-GRU based language transformation,â€ Soft Comput., vol. 23, no. 24, pp. 13007â€“13024, 2019, doi: 10.1007/s00500-019-04281-z.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)