학술논문

Navigating Bengali Linguistics: Insights from Machine and Deep Learning Perspectives for Categorization of Sentences
Document Type
Conference
Source
2023 26th International Conference on Computer and Information Technology (ICCIT) Computer and Information Technology (ICCIT), 2023 26th International Conference on. :1-6 Dec, 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Deep learning
Training
Navigation
Linguistics
User experience
Robustness
Text processing
Natural Language Processing
BERT
Deep Learning
Sentence Classification
Bengali Text Processing
Language
Abstract
The paper presents an innovative exploration into the classification of Bengali sentences, an essential aspect of advancing natural language processing (NLP) for languages characterized by rich linguistic diversity. In this comprehensive study, various machine learning and deep learning methodologies were meticulously applied to navigate the intricate linguistic landscape of the Bengali language. The research draws upon and synthesizes insights from previous studies that have focused on aspects such as sentence attributes, simplification algorithms, context-free grammar applications, and grammar-rule based transformations. The core of this study is anchored in the deployment of three distinct vectorization approaches - BERT-based, Glove-based, and TF-IDF-based - for the nuanced classification of Bengali sentences into simple, complex, and compound categories. A series of classifiers were applied, with the BERT-based vectorization demonstrating exemplary proficiency, registering an impressive 98.07% overall test accuracy. This finding underscores the model’s robustness and versatility, illuminating its potential applicability beyond the designated dataset to encompass bespoke, real-life datasets. The research is set against the backdrop of a recently published large-scale Bengali sentence dataset titled Bangla Transformation of Sentence Dataset (BTSD) consisting of 3793 samples, which has filled a significant void in the existing literature, enabling a more intricate and detailed analysis. The findings of this study not only contribute to the enriched understanding of Bengali sentence structures but also hold implications for the broader field of computational linguistics and NLP.