학술논문

Copula-based synthetic data augmentation for machine- learning emulators
Document Type
article
Source
Geoscientific Model Development, Vol 14, Pp 5205-5215 (2021)
Subject
Geology
QE1-996.5
Language
English
ISSN
1991-959X
1991-9603
Abstract
Can we improve machine-learning (ML) emulators with synthetic data? If data are scarce or expensive to source and a physical model is available, statistically generated data may be useful for augmenting training sets cheaply. Here we explore the use of copula-based models for generating synthetically augmented datasets in weather and climate by testing the method on a toy physical model of downwelling longwave radiation and corresponding neural network emulator. Results show that for copula-augmented datasets, predictions are improved by up to 62 % for the mean absolute error (from 1.17 to 0.44 W m−2).