학술논문

Self-Supervised Variational Autoencoder for Unsupervised Object Counting from Very-High-Resolution Satellite Imagery: Applications in Dwelling Extraction in FDP Settlement Areas
Document Type
Periodical
Source
IEEE Transactions on Geoscience and Remote Sensing IEEE Trans. Geosci. Remote Sensing Geoscience and Remote Sensing, IEEE Transactions on. 62:1-18 2024
Subject
Geoscience
Signal Processing and Analysis
Training
Task analysis
Annotations
Image reconstruction
Location awareness
Satellite images
Anomaly detection
Anomaly
cutPaste
dwelling counting
latent space conditioning
localization
self-supervision
unsupervised learning
variational autoencoder (VAE)
Language
ISSN
0196-2892
1558-0644
Abstract
In supervised learning, deep learning models demand a large corpus of annotated data for object detection and classification tasks. This constrains their utility in humanitarian emergency response. To overcome this problem, we have proposed an unsupervised dwelling counting from very-high-resolution (VHR) satellite imagery by combining a variational autoencoder (VAE) with an anomaly detection approach. When VAE applied in earth observation images for dwelling localization and counting, we observed two critical limitations: 1) the balance between reconstruction and good latent code, where the favor of good reconstruction of dwellings leads to weak anomaly score maps that fail to properly localize dwellings and 2) limited spatiotemporal invariance of the learned latent code. When the model is trained with datasets obtained from different geography and time, it fails to properly localize dwellings. For the first problem, we introduced self-supervision by creating synthetic anomalies. For the second problem, we introduced latent space conditioning. The approach is tested on nine VHR images obtained from six forcibly displaced people settlement areas. Results indicate that combining VAE with an anomaly detection approach has reached an area under the receiver operating characteristic curve value ranging from 0.70 at complex settlements to 0.98 at relatively less complex settlement areas. Similarly, a mean absolute error (MAE) value of 56.67 toward 5.03 is achieved for dwelling counting. Joint training of combined datasets with latent space conditioning and self-supervision enabled the achievement of results better than classical VAE, with improved spatiotemporal transferability of the model with more crisp and strong anomaly maps. Overall implementation code will be available at https://github.com/getch-geohum/SSL-VAE.