학술논문

Multi-Image Visual Question Answering for Unsupervised Anomaly Detection

Document Type

Working Paper

Author

Li, Jun; Bercea, Cosmin I.; Müller, Philip; Felsner, Lina; Kim, Suhwan; Rueckert, Daniel; Wiestler, Benedikt; Schnabel, Julia A.

Source

Subject

Computer Science - Computer Vision and Pattern Recognition
Computer Science - Computation and Language

Language

Abstract

Unsupervised anomaly detection enables the identification of potential pathological areas by juxtaposing original images with their pseudo-healthy reconstructions generated by models trained exclusively on normal images. However, the clinical interpretation of resultant anomaly maps presents a challenge due to a lack of detailed, understandable explanations. Recent advancements in language models have shown the capability of mimicking human-like understanding and providing detailed descriptions. This raises an interesting question: \textit{How can language models be employed to make the anomaly maps more explainable?} To the best of our knowledge, we are the first to leverage a language model for unsupervised anomaly detection, for which we construct a dataset with different questions and answers. Additionally, we present a novel multi-image visual question answering framework tailored for anomaly detection, incorporating diverse feature fusion strategies to enhance visual knowledge extraction. Our experiments reveal that the framework, augmented by our new Knowledge Q-Former module, adeptly answers questions on the anomaly detection dataset. Besides, integrating anomaly maps as inputs distinctly aids in improving the detection of unseen pathologies.
Comment: 13 pages, 8 figures

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송