학술논문

Data Augmentation for Human Activity Recognition With Generative Adversarial Networks
Document Type
Periodical
Source
IEEE Journal of Biomedical and Health Informatics IEEE J. Biomed. Health Inform. Biomedical and Health Informatics, IEEE Journal of. 28(4):2350-2361 Apr, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Synthetic data
Generative adversarial networks
Human activity recognition
Data models
Training
Generators
Data augmentation
Accelerometry
data augmentation
generative adversarial neural networks
human activity recognition
synthetic data
Language
ISSN
2168-2194
2168-2208
Abstract
Currently, Human Activity Recognition (HAR) applications need a large volume of data to be able to generalize to new users and environments. However, the availability of labeled data is usually limited and the process of recording new data is costly and time-consuming. Synthetically increasing datasets using Generative Adversarial Networks (GANs) has been proposed, outperforming cropping, time-warping, and jittering techniques on raw signals. Incorporating GAN-generated synthetic data into datasets has been demonstrated to improve the accuracy of trained models. Regardless, currently, there is no optimal GAN architecture to generate accelerometry signals, neither a proper evaluation methodology to assess signal quality or accuracy using synthetic data. This work is the first to propose conditional Wasserstein Generative Adversarial Networks (cWGANs) to generate synthetic HAR accelerometry signals. Furthermore, we calculate quality metrics from the literature and study the impact of synthetic data on a large HAR dataset involving 395 users. Results show that i) cWGAN outperforms original Conditional Generative Adversarial Networks (cGANs), being 1D convolutional layers appropriate for generating accelerometry signals, ii) the performance improvement incorporating synthetic data is more significant as the dataset size is smaller, and iii) the quantity of synthetic data required is inversely proportional to the quantity of real data.