학술논문

Emerging resources, enduring challenges: a comprehensive study of Kashmiri parallel corpus
Document Type
Original Paper
Source
AI & SOCIETY: Journal of Knowledge, Culture and Communication. :1-19
Subject
Parallel corpora
Kashmiri
Machine translation
Language
Language
English
ISSN
0951-5666
1435-5655
Abstract
This study addresses the critical shortage of parallel corpora for the Kashmiri language, a significant barrier to advancing language processing technologies for under-resourced languages. Despite Kashmiri's rich cultural heritage, the development of language technology resources, especially parallel corpora, has been notably limited. Our research involves a detailed analysis of the only available parallel corpora for Kashmiri, utilizing these datasets to develop and evaluate Neural Machine Translation (NMT) models. Through this evaluation, we categorize errors and assess the corpora's adequacy in quality and quantity for supporting effective language processing tasks. Additionally, we investigate the reasons behind the scarcity of high-quality resources and identify the challenges inherent in creating robust parallel corpora for Kashmiri. By proposing solutions to these challenges, our study aims to contribute to the revitalization and global recognition of the Kashmiri language, bridging a significant gap in the field of language technology and emphasizing the importance of parallel corpora in preserving linguistic diversity and facilitating technological advancement.