학술논문

OBViT:A high-resolution remote sensing crop classification model combining OBIA and Vision Transformer
Document Type
Conference
Source
2023 11th International Conference on Agro-Geoinformatics (Agro-Geoinformatics) Agro-Geoinformatics (Agro-Geoinformatics), 2023 11th International Conference on. :1-6 Jul, 2023
Subject
Aerospace
Bioengineering
Computing and Processing
Geoscience
Robotics and Control Systems
Signal Processing and Analysis
Training
Semantic segmentation
Crops
Transformers
Robustness
Data models
Convolutional neural networks
Remote Sensing
Transformer
Crop Classification
Convolutional Neural Network
Language
Abstract
This study proposes a new object-based deep convolutional neural network approach, called OBViT (Object-based Vision Transformer), for crop classification tasks using high-resolution remote sensing images. Compared to traditional machine learning methods and CNN-based remote sensing image semantic segmentation models, the proposed model outperforms existing models regarding classification accuracy and model robustness. The OBViT model was applied to remote sensing crop classification tasks in the Xingren region of Guizhou, China, and its precision, recall, and f1-score reached 87.98%, 87.91%, and 86.56%, respectively. This study’s main contribution is designing a remote sensing image semantic segmentation model structure suitable for crop classification in the field by optimizing model components such as activation functions and optimizers and using a multi-scale training strategy during model training to improve performance. Specifically, the study used the SLIC algorithm to segment remote sensing images into uniformly sized superpixel objects for easier model input and performed multi-scale data augmentation on each object to increase data diversity. Additionally, the pre-trained Vision Transformer (ViT) model was used to replace the CNN module in the OBCNN model, significantly improving classification accuracy. The Mish activation function was also used instead of the GELU activation function, and the Adan optimizer was used instead of the Adam optimizer to improve semantic segmentation accuracy further. Finally, the OBViT crop classification model was trained using manually annotated remote sensing datasets. An object-based K-nearest neighbor filtering algorithm was used for post-processing to improve semantic segmentation accuracy further. Experimental results demonstrate that the OBViT model achieves good semantic segmentation performance on crop remote sensing monitoring datasets, with higher accuracy and robustness than other deep learning methods. The land cover categories in the experimental dataset were tobacco, corn, barley rice, artificial buildings, and “other’’. The study also conducted an interpretability analysis on the model, revealing that OBViT can better capture different target categories’ spatial structure and texture features. Future research can explore the use of OBViT for more complex remote sensing image data and further optimize the model structure and training strategies to improve accuracy. Additionally, OBViT can be applied to other remote sensing image semantic segmentation tasks in fields such as agricultural pest monitoring and analysis.