학술논문

Class-Guidance Network Based on the Pyramid Vision Transformer for Efficient Semantic Segmentation of High-Resolution Remote Sensing Images
Document Type
article
Source
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 16, Pp 5578-5589 (2023)
Subject
Class-guidance network
remote sensing images
semantic segmentation
transformer
Ocean engineering
TC1501-1800
Geophysics. Cosmic physics
QC801-809
Language
English
ISSN
2151-1535
Abstract
Small differences between classes and big variations within classes in multicategory semantic segmentation are problems that are not completely solved by the “encoder–decoder” structure of the fully convolutional neural network, leading to the imprecise perception of easily confused categories. To address this issue, in this article, we believe that sufficient contextual information can provide more interpretation clues to the model. Additionally, if we can mine the class-specific perceptual information for each semantic class, we can enhance the information belonging to the corresponding class in the decoding process. Therefore, we propose the class-guidance network based on the pyramid vision transformer (PVT). In detail, with the PVT as the encoder network, the following decoding process is composed of three stages. First, we design a receptive field block to expand the receptive field to different degrees using parallel branching processing and different dilatation rates. Second, we put forward a semantic guidance block to utilize the high-level features to guide the channel enhancement of low-level features. Third, we propose the class guidance block to achieve the class-aware guidance of adjacent features and achieve the refined segmentation by a progressive approach. The overall accuracy of the method is 88.91% and 88.87%, respectively, according to experimental findings on the Potsdam and Vaihingen datasets.