학술논문

An Unseen Features Enhanced Text Classification Approach
Document Type
Conference
Source
2023 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2023 International Joint Conference on. :1-8 Jun, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Training
Vocabulary
Text categorization
Training data
Machine learning
Predictive models
Benchmark testing
Unseen features
Out-of-distribution
Text classification
Language
ISSN
2161-4407
Abstract
In this paper, we discuss the issue of features that emerge during the prediction phase of a machine learning model, termed as unseen features. Because unseen features are absent from the vocabulary of the trained model, they are often rejected during the preprocessing stage of the learning model in standard machine learning approaches. We introduce the idea of unseen features and a method for identifying and using them for classification tasks. Because the dimension of feature vector required for trained machine learning model is going to differ upon incorporating unseen features of the testing data sample, it is not practical to directly incorporate unseen features since they only exist during the prediction phase of a machine learning model. As a result, the feature space for the training set is transformed to the embedding space which facilitates the use of unseen features. The proposed approach is empirically evaluated using standard metrics over three benchmark datasets in diverse circumstances (natural and balanced datasets) and on various text types – long-texts (aka structured texts) and short-texts (aka unstructured texts) considering five distinct classification algorithms. The experimental findings confirm the effectiveness of using unseen features during a machine learning model's deployment phase. The proposed unseen features enhanced technique outperforms the conventional approaches in both balanced class distribution and natural class distribution scenarios by a significant margin of at least 10%.