학술논문

Visual Question Answering in the Medical Domain
Document Type
Conference
Source
2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA) DICTA Digital Image Computing: Techniques and Applications (DICTA), 2023 International Conference on. :379-386 Nov, 2023
Subject
Computing and Processing
General Topics for Engineers
Geoscience
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Visualization
Computational modeling
Self-supervised learning
Question answering (information retrieval)
Cognition
Task analysis
Biomedical imaging
Computer Vision
Natural Language Processing
Medical Visual Question Answering
Convolutional Neural Network
Recurrent Neural Network
Transformers
Computed Tomography
Magnetic Resonance Imaging
Language
Abstract
Medical visual question answering (Med-VQA) is a machine learning task that aims to create a system that can answer natural language questions based on given medical images. Although there has been rapid progress on the general VQA task, less progress has been made on Med-VQA due to the lack of large-scale annotated datasets. In this paper, we present domain-specific pre-training strategies, including a novel contrastive learning pre-training method, to mitigate the problem of small datasets for the Med-VQA task. We find that the model benefits from components that use fewer parameters. We also evaluate and discuss the model’s visual reasoning using evidence verification techniques. Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.