학술논문

Enhanced Deep Predictive Modeling of Wastewater Plants With Limited Data
Document Type
Periodical
Source
IEEE Transactions on Industrial Informatics IEEE Trans. Ind. Inf. Industrial Informatics, IEEE Transactions on. 20(2):1920-1930 Feb, 2024
Subject
Power, Energy and Industry Applications
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
Data models
Predictive models
Biological system modeling
Wastewater
Computational modeling
Transfer learning
Training data
Data augmentation
deep learning (DL)
soft sensor
transfer learning (TL)
wastewater
Language
ISSN
1551-3203
1941-0050
Abstract
Deep learning is being widely utilized in industrial process monitoring, control, and optimization. However, in the wastewater industry, its applications are still underexplored. This is because deep learning requires a large amount of labeled training data to induce effective predictive models. Owing to the high cost of sensors and frequency and delay in sampling and laboratory analytics, wastewater treatment process data can be sparse with varying frequencies. One option to address training data limitations is to use transfer learning. However, owing to the large covariate shift between the commonly adopted source domains for transfer learning and the target domain of wastewater processes, this approach leads to unacceptable performance. We address this issue by proposing a novel synthetic data generation method for deep predictive modeling of wastewater plants. Employing a Markov process that utilizes random walk, our technique enables the generation of abundant annotated data for our target domain. The method preserves the temporal dynamics and distribution of the original data, thereby closely mimicking the potential original samples of the domain. We extensively evaluate our method over two different high-rate-algae-based treatment datasets, demonstrating considerable performance gains over existing transfer learning. Our proposed algorithm can assist plant operators to deploy responsive supportive models with limited data.