학술논문

MIFTP: A Multimodal Multi-Level Independent Fusion Framework with Improved Twin Pyramid for Multilabel Chest X-Ray Image Classification
Document Type
Conference
Source
2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) ICTAI Tools with Artificial Intelligence (ICTAI), 2022 IEEE 34th International Conference on. :1112-1119 Oct, 2022
Subject
Bioengineering
Computing and Processing
Robotics and Control Systems
Semantics
MIMICs
Interference
Thorax
Artificial intelligence
X-ray imaging
Biomedical imaging
multi-level fusion
multi-label image classification
feature pyramid
multimodality
Language
ISSN
2375-0197
Abstract
In image-text multimodal image classification, the fusion is usually between text features and image features. Such fusion assumes image features from multiple network layers are unitedly fused with text features synchronously. In fact, these image features have different interactions with text features. These interactions have mutual interference in conventional fusion. And since the low-level image features are semantically weak, the fusion between them and text features is not as effective as that between high-level image features and text features. To solve problems above, the paper proposed a framework of multi-level independent fusion between text features and different-level image features. In this framework, the fusions between text features and multi-level image features are conducted asynchronously, where the fusions are independent from each other. Moreover, to improve the fusion efficiency when text features are fused with low-level image features, our method complement semantic information for low-level image features with Twin Pyramid (TP) module which can propagate semantic information top down to them. Substantial experiments on MIMIC-CXR data sets demonstrate that the multi-level independent fusion can effectively concatenate the image features and text features and outperforms the traditional methods.