학술논문

Mining Influential Training Data by Tracing Influence on Hard Validation Samples
Document Type
Conference
Source
2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) ICTAI Tools with Artificial Intelligence (ICTAI), 2022 IEEE 34th International Conference on. :167-173 Oct, 2022
Subject
Bioengineering
Computing and Processing
Robotics and Control Systems
Training
Deep learning
Training data
Benchmark testing
Data models
Complexity theory
Data mining
training data pruning
hard validation sample
influence value
Language
ISSN
2375-0197
Abstract
The ever-growing deep learning model size is constantly driven by the ever-growing dataset size. Mining the influential training data has significant payoff of either reducing the training time, model complexity as well as potentially increasing the model accuracy. In this paper, we propose a few approaches, e.g. classifying the validation dataset into easy, medium and hard levels, introducing influence value by calculating each training data on the hard validation data, to co-prune the validation dataset and the training dataset. Empirically we conclude that the portion of the hard validation data could be used to mine the most influential training data, whereby reducing the training dataset size by 50% without losing accuracy in our experiments.