학술논문

Local Climate Zone Classification via Semi-Supervised Multimodal Multiscale Transformer
Document Type
Article
Source
IEEE Transactions on Geoscience and Remote Sensing; 2024, Vol. 62 Issue: 1 p1-17, 17p
Subject
Language
ISSN
01962892; 15580644
Abstract
Local climate zone (LCZ) classification plays a critical role in urban environment research and has attracted extensive attention from many researchers. However, the potential of deep learning-based approaches is not yet fully explored in this field, even though neural networks continue to push the frontier for various applications. In this article, we propose a novel multimodal multiscale transformer (MM-Transformer) network for LCZ classification by introducing multiscale patch embedding (MPE) and multimodal fusion learning (MFL) in transformer architecture. The proposed MPE effectively captures hierarchical interrelationships of image contextual neighborhoods and automatically learns discriminative features. The proposed MFL enables the network to naturally fuse multispectral and synthetic aperture radar (SAR) data under the guidance of the attention mechanism. To further improve classification accuracy, we impose semi-supervised learning (SemiSL) to mine unlabeled image data information. Both labeled and pseudo-labeled data jointly drive our network updates. Experiments conducted on the So2Sat LCZ42, CHN15-LCZ, and SouthKorea6-LCZ benchmark datasets demonstrate that our proposed approach outperforms other existing methods significantly and achieves state-of-the-art performance. In the generated LCZ maps, urban and natural classes are well distinguished, and the urban structure with waters or mountains is well preserved. Finally, we also discuss the impact of the sample receptive field and sample heterogeneity on LCZ classification performance, which provides a new idea for future studies of LCZ classification.