학술논문

Dual-Branch Feature Fusion Network Based Cross-Modal Enhanced CNN and Transformer for Hyperspectral and LiDAR Classification
Document Type
Periodical
Source
IEEE Geoscience and Remote Sensing Letters IEEE Geosci. Remote Sensing Lett. Geoscience and Remote Sensing Letters, IEEE. 21:1-5 2024
Subject
Geoscience
Power, Energy and Industry Applications
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Feature extraction
Transformers
Laser radar
Convolution
Three-dimensional displays
Fuses
Training
Cross-modal enhanced CNN and Transformer
feature fusion
ground classification
hyperspectral image (HIS)
light detection and ranging (LiDAR)
Language
ISSN
1545-598X
1558-0571
Abstract
The joint classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has attracted considerable attention in the field of remote sensing. Integrating the advantages of the two data sources can provide precise data support and analytical decision-making for remote-sensing applications. However, due to the inherent differences in properties and semantic information from heterogeneous data, most existing deep-learning methods suboptimally extract the characteristic features of both data sources while utilizing their interactive information. In this letter, we propose a dual-branch feature fusion network-based cross-modal enhanced CNN and Transformer (DF2NCECT) to make full use of the respective features and interactive information of multisource data. DF2NCECT consists of two main stages. One is the basic feature extraction stage, which builds a hybrid convolution module based on 3DCNN and inception structure to fully extract the joint features of HSI from multiple spatial perspectives. The other is the deep feature fusion stage, where the CNN and Transformer are designed in parallel to fully explore and fuse deep features between HSI and LiDAR. More importantly, to achieve efficacious interactive information between HSI and LiDAR, a cross-modal enhanced CNN and Transformer module (CECT) is designed to deeply enhance the fused interactive features from global/local perspectives. Experiments show that the proposed method is superior and outperforms the comparison methods by an average of 3.06% in OA on Houston2013 and 1.79% on Summer, respectively.