학술논문

Generating a Standardized Dataset: Gurmukhi Offline Handwritten Collection of Tehsil and Sub- Tehsil names from Punjab
Document Type
Conference
Source
2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024 11th International Conference on. :1-5 Mar, 2024
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Computers
Handwriting recognition
Image recognition
Text recognition
Databases
Market research
Libraries
Handwriting Recognition
Tehsil classification
Sub- Tehsil Classification
Gurmukhi Dataset
Gurmukhi word
benchmarking
Transfer Learning
Language
ISSN
2769-2884
Abstract
In recent times, there has been considerable focus on researching the interpretation of handwritten documents in Indian languages. Handwriting recognition involves the ability of computers to convert human handwriting into machine-readable text. One significant challenge is the lack of a consistent library of handwritten texts in Indian languages, which is essential for assessing the performance of various document recognition algorithms and for making comparisons among them. However, due to the limited availability of Gurmukhi script data in the public domain, conducting a structured evaluation of techniques for recognizing Gurmukhi tehsil and sub-tehsil names is not feasible. To address this gap, this paper presents the construction of an unconstrained Gurmukhi handwritten words database (GHWD). The GHWD comprises 62,000 handwritten words authored by 40 distinct editors, encompassing 77 tehsil and 78 sub-tehsil names. During the data generation process, each editor wrote every Gurmukhi word ten times. Editors were selected from diverse backgrounds and age groups to ensure a varied and representative dataset.