학술논문

Efficient UNet fusion of convolutional neural networks and state space models for medical image segmentation
Document Type
Article
Source
In Digital Signal Processing March 2025 158
Subject
Language
ISSN
1051-2004
Abstract
In current medical image segmentation research, convolutional neural networks (CNNs) excel in local feature extraction but struggle with global context modeling. Although the self-attention mechanism in Transformers effectively captures global dependencies, its quadratic time complexity limits its efficient application on large-scale medical image datasets. To address these issues and develop a high-precision lightweight model, this study proposes an innovative model, CNS-UNet: Combined Neural Network and State Space Model in UNet, which integrates the State Space Model (SSM) architecture with CNNs. This combination allows the model to achieve effective global information modeling while maintaining low computational complexity. We adopt a U-shaped encoder-decoder framework, integrating newly developed Double Visual Space State (DVSS) and Residual Convolution (Res-Conv) modules as dual encoders for feature extraction. Additionally, we design a Cross-Fusion Module (CFM) to integrate global and local features from the dual encoders and incorporate a Lightweight Attention Gate (LAG) mechanism to enhance the recognition of key features and filter out irrelevant information. Experimental results show that CNS-UNet achieves mean Intersection over Union (mIoU) scores of 89.67%, 95.18%, and 87.96% on the Kvasir-SEG, CVC-ClinicDB, and ISIC2018 datasets, respectively, with a reduction in model parameters of 71.8% compared to UNet. These results validate the unique advantages of CNS-UNet in medical image segmentation tasks and highlight its potential for broad application.