학술논문

Comparative Analysis of Outcomes of Tesseract OCR for Different Languages

Document Type

Conference

Author

Source

2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) ICICV Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2024 5th International Conference on. :95-100 Mar, 2024

Subject

Computing and Processing
Image resolution
Barium
Image color analysis
Optical character recognition
Noise
Real-time systems
Communications technology
Gujarati
Hindi
Optical Character Recognition (OCR)
Tesseract

Language

Abstract

Hindi and Gujarati are Devanagari scripts that have contributed to the culture of India in the form of literature and human interactions. Due to aging, the textually rich literature of these languages may not exist for future generations. To preserve them, the technological solution known as an optical character recognition engine is used. Tesseract OCR engine supports the conversion of Hindi and Gujarati language-based images to equivalent text outputs. Major challenge in generating better quality of text outputs is acquisition of high resolution and noise free images. This study has compared the working of the Tesseract OCR engine for the Gujarati and Hindi languages and determined how effective it is for real-time applications.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송