학술논문

Multi-modal Diffusion Network with Controllable Variability for Medical Image Segmentation
Document Type
Conference
Source
2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Bioinformatics and Biomedicine (BIBM), 2024 IEEE International Conference on. :3817-3822 Dec, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Image segmentation
Visualization
Uncertainty
Semantics
Noise reduction
Stochastic processes
Contrastive learning
Gaussian distribution
Diffusion models
Biomedical imaging
Medical image segmentation
Multi-modal diffusion network
Controllable variability
Language
ISSN
2156-1133
Abstract
In diffusion-based medical segmentation models, stochastic sampling is commonly used to generate multiple masks. However, the inherent variability in diffusion models can lead to significant biases in some masks, resulting in the fused mask deviating from the true mask. In this study, we propose a novel multi-modal diffusion segmentation network (MMDSN) with controllable variability, specifically designed to address the issue of variability in diffusion models. MMDSN achieves multi-modal conditional control through medical text annotations, thereby enhancing consistency of visual semantic representation and establishing a correspondence between vision and language for diffusion models. Additionally, MMDSN constrains the uncertainty distributions of multiple timesteps within the latent Gaussian space, controlling the variability at each denoising timestep. Extensive experiments on the Qata-Covid19 and MosMed datasets demonstrate that our proposed method surpasses existing state-of-the-art diffusion networks, producing a high-quality, controllable segmentation map with just a single reverse diffusion step and one sampling.