학술논문
Generating a Standardized Dataset: Gurmukhi Offline Handwritten Collection of Tehsil and Sub- Tehsil names from Punjab
Document Type
Conference
Author
Source
2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024 11th International Conference on. :1-5 Mar, 2024
Subject
Language
ISSN
2769-2884
Abstract
In recent times, there has been considerable focus on researching the interpretation of handwritten documents in Indian languages. Handwriting recognition involves the ability of computers to convert human handwriting into machine-readable text. One significant challenge is the lack of a consistent library of handwritten texts in Indian languages, which is essential for assessing the performance of various document recognition algorithms and for making comparisons among them. However, due to the limited availability of Gurmukhi script data in the public domain, conducting a structured evaluation of techniques for recognizing Gurmukhi tehsil and sub-tehsil names is not feasible. To address this gap, this paper presents the construction of an unconstrained Gurmukhi handwritten words database (GHWD). The GHWD comprises 62,000 handwritten words authored by 40 distinct editors, encompassing 77 tehsil and 78 sub-tehsil names. During the data generation process, each editor wrote every Gurmukhi word ten times. Editors were selected from diverse backgrounds and age groups to ensure a varied and representative dataset.