학술논문
Two Novel Approaches for Automatic Labelling in Semi-Supervised Methods
Document Type
Conference
Author
Source
2020 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2020 International Joint Conference on. :1-8 Jul, 2020
Subject
Language
ISSN
2161-4407
Abstract
In real world classification problems, the amount of labelled data is usually limited (very hard or expensive to manually label the instances). However, a natural limitation of a classification algorithm is that it needs to have a set of labelled instances with a reasonable size in order to achieve a reasonable performance. Therefore, one solution to smooth out this problem is the use of semi-supervised learning. Several semi-supervised approaches (e.g. self training) have been proposed in the literature, aiming to use only a few labelled instances, to train a classifier, and to apply a labelling process in which a high number of unlabelled instances is labelled and included in the labelled set. However, this approach can include unreliable instances to the labelled set, impairing the performance of a semi-supervised method. In other words, the selection criterion to include newly labelled instances in the labelled set as well as the labelling step have an important effect in the performance of a semi-supervised method. In this paper, we propose two new approaches for automatic labelling in semi-supervised methods based on the prediction agreement of a pool of classifier as selection criterion. In addition, we compare them to the standard self-training method, and one variation of it called Flexible Confidence Classifier as baselines. In general, both methods obtained significantly better predictive results than the other two methods over 40 classification datasets.