Efficient Compression of Binarized Tainted Documents



Tainted documents are degraded or ruined documents of low quality and of worn out look. The taintations are like disparity variation, smear, ooze, uneven illumination. To enhance the visual quality of the tainted document binarization technique is applicable. Binarization can binarize all the tainted documents but performing binarization to faultily tainted documents is a complicated task, the complication is observed in the identification of variations between the document background and text foreground. The system uses OTSU binarization that can binarize any kind of taintations. The proposed technique addresses the variations between background and foreground text of the document and calculates the optimum threshold separating the two classes so that their combined spread is minimal or equivalent hence the vision quality increases. Enhancement of vision quality also results in the enhancement of document size. Compression is performed on the binarized tainted document to reduce the tainted document size. The compression technique projected to use in this paper is Run Length coding which helps to reduce the size of the tainted document. Run Length coding is lossless compression technique which is very successful in dealing with binary images.


Binarization; Taintation; Compression; Run Length Coding;

Full Text:



Bolan Su, Shijian Lu, and Chew Lim Tan, Senior Member of IEEE: Robust Document Image Binarization Technique for Degraded Document Images. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 4, APRIL 2013.

Hafizan Mat Som1, Jasni Mohamad Zain2 and Amzari Jihadi Ghazali3: Application of Threshold Techniques for Readability Improvement of Jawi Historical Manuscript Images by. Advanced Computing: An International Journal ( ACIJ ), Vol.2, No.2, March 2011.

Anmol Jyot Maan: Analysis and Comparison of Algorithms for Lossless Data Compression. International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 139-146.

S. Sarika, S. Srilali: Improved Run Length Encoding Scheme For Efficient Compression Data Rate. Journal of Engineering Research and Applications , ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp 2017-2020.

Automatic Thresholding document from http://www.math.tau.ac.il/~turkel/notes/otsu

Otsu: A Threshold Selection Method from Gray-Level Histograms and Otsu Thresholding. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, No. 1, pp. 62-66, 1979.

Boston Cigan’s:Java image binarization using Otsu’s algorithm. http://developer.bostjan-cigan.com/java-image-binarization

Tarnjot Kaur Gill, Document Image Binarization Techniques- A Review. International Journal of Computer Applications 98(12):1-4, July 2014

Aroop Mukherjee and Soumen Kanrar, Enhancement of Image Resolution by Binarization, International Journal of Computer Applications, Volume 10 – 10, 2010.

Arwa Mahmoud AL-Khatatneh, Sakinah li Pitchay and Musab Kasim Al-qudah, Compound binarization of Degraded Documents, ARPN Journal of Engineering and Applied Sciences, Vol. 10, NO. 2, ISSN 1819-6608, 2015.

Stoimen: Data Compression with Run-length Encoding Computer Algorithms http://www.stoimen.com/blog/.

RLE Compression http://www.prepressure.com/library/compression-algorithm/rle

Gomathi.K.V1, Lotus.R2: Digital Image Compression Techniques, IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163, pISSN: 2321-7308 Volume: 03 Issue: 10, Oct-2014, pp.285-290.

DOI: https://doi.org/10.26483/ijarcs.v9i2.5520


  • There are currently no refbacks.

Copyright (c) 2018 International Journal of Advanced Research in Computer Science