학술논문

Comparative Analysis of Outcomes of Tesseract OCR for Different Languages
Document Type
Conference
Source
2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) ICICV Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2024 5th International Conference on. :95-100 Mar, 2024
Subject
Computing and Processing
Image resolution
Barium
Image color analysis
Optical character recognition
Noise
Real-time systems
Communications technology
Gujarati
Hindi
Optical Character Recognition (OCR)
Tesseract
Language
Abstract
Hindi and Gujarati are Devanagari scripts that have contributed to the culture of India in the form of literature and human interactions. Due to aging, the textually rich literature of these languages may not exist for future generations. To preserve them, the technological solution known as an optical character recognition engine is used. Tesseract OCR engine supports the conversion of Hindi and Gujarati language-based images to equivalent text outputs. Major challenge in generating better quality of text outputs is acquisition of high resolution and noise free images. This study has compared the working of the Tesseract OCR engine for the Gujarati and Hindi languages and determined how effective it is for real-time applications.