학술논문

Masked Image Modeling Based on Momentum Contrast
Document Type
Conference
Source
2024 4th International Conference on Neural Networks, Information and Communication (NNICE) Neural Networks, Information and Communication (NNICE), 2024 4th International Conference on. :421-425 Jan, 2024
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Computational modeling
Semantics
Merging
Self-supervised learning
Feature extraction
Robustness
Decoding
masked image modeling
contrastive learning
self-supervised learning
Language
Abstract
Limited to the design of the pre-text task and the singularity of the decoder, the current Masked Image Modeling fails to achieve satisfactory global semantic representation in downstream tasks and to discern contextual information. To tackle above issues, we propose a masked image modeling framework based on momentum contrast. First, we introduce a multi-scale hidden layer fusion module to capture nuanced features. Then, the dense reconstruction decoder is designed to capture contextual information through reconstruction, while the global mapping decoder models global semantic information. Besides, the target branch updates the parameters and prevents model collapse by merging the momentum encoder. Furthermore, it extracts features from the remaining sections to stabilize the model. Last, contrastive loss is computed using the global features and the features obtained from the target branch. The experimental results on four benchmark datasets show that the proposed method can effectively improve the classification accuracy in fine-tuning and linear classification, demonstrating its strong generalization and representation capabilities.