International Journal of Computational Linguistics Research
An Analysis of Optical Character Recognition (OCR) Methods
Nabeel Ashraf, Syed Yasser Arafat, Muhammad Javed Iqbal Department of Computer Science and Information Technology, Mirpur University of Science and Technology (MUST), Department of Computer Science, University of Engineering and Technology Taxila (UET)
Abstract: This survey paper presents a comprehensive study of Urdu Optical Character Recognition (OCR) methodologies.
The main focus of the study is detail investigation of the techniques used to recognize the Nastaliq, Naskh and other similar
scripts fonts. These script fonts are used to write Urdu, Arabic, Pashto and Sindhi etc. languages. Several methods of text
recognition and classification of Urdu like cursive scripts are discussed. The survey contains the comparison and description
of each method in a brief way which identifies handwritten, printed and online text recognition as well. For each optical
character recognition (OCR) the phases of pre-processing, segmentation, feature extraction, classification and finally
recognition are discussed. After the comprehensive analysis of all methodologies critics and future work in Urdu cursive
scripts, i.e. Naskh and Nastaliq scripts are also proposed.
Keywords: OCR, Urdu text, Text Recognition An Analysis of Optical Character Recognition (OCR) Methods
References: [1] Naz, Saeeda. (2014). The optical character recognition of Urdu-like cursive scripts. Pattern Recognition 47 (3) 1229-1248.
[2] Naz, Saeeda. (2014). Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80-87.
[3] Sanjrani, Anwar Ali. (2016). Handwritten Optical Character Recognition system for Sindhi numerals. Computing, In: Electronic
and Electrical Engineering (ICE Cube), 2016 International Conference on. IEEE, 2016.
[4] Shabbir, Safia. (2016). Optical character recognition system for Urdu words in nastaliq font.
[5] Naz, Saeeda. (2010). Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term
memory neural networks. SpringerPlus 5 (1) 2010.
[6] Din, Israr Ud. (2017). Segmentation-free optical character recognition for printed Urdu text. EURASIP Journal on Image and
Video Processing 1.
[7] Khan, Wafa Qaiser., Reema Qaiser Khan. (2015). Urdu optical character recognition technique using point feature matching;
a generic approach. In: Information and Communication Technologies (ICICT), 2015 International Conference on. IEEE, 2015.
[8] Ali, Toflk., Tauseef Ahmad, Mohd Imran. (2016). UOCR: a ligature based approach for an Urdu OCR system. Computing for
Sustainable Global Development (INDIACom), 2016 3rd International Conference on. IEEE, 2016.
[9] Ahmad, Ibrar. (2017). Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China
Communications 14.1, 146-157.
[10] Rehman, Zia Ul., Imran Sarwar Bajwa. (2016). Lexicon-based sentiment analysis for Urdu language. In: Innovative Computing
Technology (INTECH), 2016 Sixth International Conference on. IEEE, 2016
[11] Ahmad, Ibrar. (2017). Line and ligature segmentation of Urdu Nastaleeq text. IEEE Access 5, 10924-10940.
[12] Akram, Qurat Ul Ain., Sarmad Hussain. (2017). Ligature-based font size independent OCR for Noori Nastalique writing style.
Arabic Script Analysis and Recognition (ASAR), In: 2017 1st International Workshop on. IEEE, 2017.
[13] Khattak, Israr Uddin. (2015). Recognition of Urdu ligatures-a holistic approach. Document Analysis and Recognition
(ICDAR), 2015 13th International Conference on. IEEE, 2015.
[14] Sharma, Harmohan., Dharam Veer Sharma. State of the art in Nastaleeq Script Recognition.
[15] Mittal, Anshul. Rotation and script independent text detection from video frames using sub pixel mapping. Journal of Visual
Communication and Image Representation 46 (2017), 187-198.
[16] Zaman, Safdar., Kanwal Anwar, Riaz Khan. (2016). Image character through signal and pattern formation. Learning and
Technology Conference (L&T), 2016 13th. IEEE, 2016.
[17] Malik, Zumral. (2015). Video Script Identification Using a Combination of Textural Features. In: Signal-Image Technology &
Internet-Based Systems (SITIS), 2015 11th International Conference on. IEEE, 2015.
[18] Naz, Saeeda. (2016). Challenges in baseline detection of cursive script languages.In: Science and Information Conference
(SAI), 2013. IEEE, 2013.
[19] Bukhari, Syed Saqib., Faisal Shafait, Thomas M. Breuel. (2013). Towards generic text-line extraction. Document Analysis and
Recognition (ICDAR), In: 2013 12th International Conference on. IEEE, 2013.
[20] Pal. U., Anirban Sarkar. (2003). Recognition of printed Urdu script. Document Analysis and Recognition. In: Proceedings.
Seventh International Conference on. IEEE, 2003.
[21] Javed, Sobia, T. (2010). Segmentation free nastalique urdu ocr. World Academy of Science, Engineering and Technology 46,
456-461.
[22] Jameel, Mohd., Sanjay Kumar, Abdul Karim. (2017). A Review on Recognition of Handwritten Urdu Characters Using Neural
Networks. International Journal 8 (9).
[23] Jameel, Mohd., Sanjay Kumar. (2017). Offline Recognition of Handwritten Urdu Characters using B Spline Curves: A Survey.
International Journal of Computer Applications 157.1 (2017).
[24] Ahmed, Saad Bin. (2017). Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural
Computing and Applications. 1-9.
[25] Shi, Baoguang., Xiang Bai, Cong Yao. (2017). An end-to-end trainable neural network for image-based sequence recognition
and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (11) 2298-
2304.
[26] Naz, Saeeda (2016). Zoning features and 2DLSTM for Urdu text-line recognition. Procedia Computer Science 96 (2016), 16-
22.
[27] Khan, K. (2015). Urdu text classification using decision trees. High-Capacity Optical Networks and Enabling/Emerging
Technologies (HONET), In: 2015 12th International Conference on. IEEE, 2015.
[28] Liao, Minghui. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 2017.
[29] Kanwal, Kehkashan. (2014). Assistive Glove for Pakistani Sign Language Translation. In: Multi-Topic Conference (INMIC),
2014 IEEE 17th International. IEEE, 2014.