학술논문
Multi-modal Diffusion Network with Controllable Variability for Medical Image Segmentation
Document Type
Conference
Author
Source
2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Bioinformatics and Biomedicine (BIBM), 2024 IEEE International Conference on. :3817-3822 Dec, 2024
Subject
Language
ISSN
2156-1133
Abstract
In diffusion-based medical segmentation models, stochastic sampling is commonly used to generate multiple masks. However, the inherent variability in diffusion models can lead to significant biases in some masks, resulting in the fused mask deviating from the true mask. In this study, we propose a novel multi-modal diffusion segmentation network (MMDSN) with controllable variability, specifically designed to address the issue of variability in diffusion models. MMDSN achieves multi-modal conditional control through medical text annotations, thereby enhancing consistency of visual semantic representation and establishing a correspondence between vision and language for diffusion models. Additionally, MMDSN constrains the uncertainty distributions of multiple timesteps within the latent Gaussian space, controlling the variability at each denoising timestep. Extensive experiments on the Qata-Covid19 and MosMed datasets demonstrate that our proposed method surpasses existing state-of-the-art diffusion networks, producing a high-quality, controllable segmentation map with just a single reverse diffusion step and one sampling.