학술논문
Navigating Bengali Linguistics: Insights from Machine and Deep Learning Perspectives for Categorization of Sentences
Document Type
Conference
Author
Source
2023 26th International Conference on Computer and Information Technology (ICCIT) Computer and Information Technology (ICCIT), 2023 26th International Conference on. :1-6 Dec, 2023
Subject
Language
Abstract
The paper presents an innovative exploration into the classification of Bengali sentences, an essential aspect of advancing natural language processing (NLP) for languages characterized by rich linguistic diversity. In this comprehensive study, various machine learning and deep learning methodologies were meticulously applied to navigate the intricate linguistic landscape of the Bengali language. The research draws upon and synthesizes insights from previous studies that have focused on aspects such as sentence attributes, simplification algorithms, context-free grammar applications, and grammar-rule based transformations. The core of this study is anchored in the deployment of three distinct vectorization approaches - BERT-based, Glove-based, and TF-IDF-based - for the nuanced classification of Bengali sentences into simple, complex, and compound categories. A series of classifiers were applied, with the BERT-based vectorization demonstrating exemplary proficiency, registering an impressive 98.07% overall test accuracy. This finding underscores the model’s robustness and versatility, illuminating its potential applicability beyond the designated dataset to encompass bespoke, real-life datasets. The research is set against the backdrop of a recently published large-scale Bengali sentence dataset titled Bangla Transformation of Sentence Dataset (BTSD) consisting of 3793 samples, which has filled a significant void in the existing literature, enabling a more intricate and detailed analysis. The findings of this study not only contribute to the enriched understanding of Bengali sentence structures but also hold implications for the broader field of computational linguistics and NLP.