학술논문

Vision Transformers, Ensemble Model, and Transfer Learning Leveraging Explainable AI for Brain Tumor Detection and Classification
Document Type
Periodical
Source
IEEE Journal of Biomedical and Health Informatics IEEE J. Biomed. Health Inform. Biomedical and Health Informatics, IEEE Journal of. 28(3):1261-1272 Mar, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Tumors
Brain modeling
Magnetic resonance imaging
Cancer
Feature extraction
Computational modeling
Residual neural networks
Brain tumor classification
deep learning
ensemble learning
multiclass classification
transfer learning
VGG16
VGG19
InceptionV3
xception
ResNet50
InceptionResNetV2
explainable AI
LIME
vision transformers
SWIN
CCT
EANet
Language
ISSN
2168-2194
2168-2208
Abstract
The abnormal growth of malignant or nonmalignant tissues in the brain causes long-term damage to the brain. Magnetic resonance imaging (MRI) is one of the most common methods of detecting brain tumors. To determine whether a patient has a brain tumor, MRI filters are physically examined by experts after they are received. It is possible for MRI images examined by different specialists to produce inconsistent results since professionals formulate evaluations differently. Furthermore, merely identifying a tumor is not enough. To begin treatment as soon as possible, it is equally important to determine the type of tumor the patient has. In this paper, we consider the multiclass classification of brain tumors since significant work has been done on binary classification. In order to detect tumors faster, more unbiased, and reliably, we investigated the performance of several deep learning (DL) architectures including Visual Geometry Group 16 (VGG16), InceptionV3, VGG19, ResNet50, InceptionResNetV2, and Xception. Following this, we propose a transfer learning(TL) based multiclass classification model called IVX16 based on the three best-performing TL models. We use a dataset consisting of a total of 3264 images. Through extensive experiments, we achieve peak accuracy of 95.11%, 93.88%, 94.19%, 93.88%, 93.58%, 94.5%, and 96.94% for VGG16, InceptionV3, VGG19, ResNet50, InceptionResNetV2, Xception, and IVX16, respectively. Furthermore, we use Explainable AI to evaluate the performance and validity of each DL model and implement recently introduced Vison Transformer (ViT) models and compare their obtained output with the TL and ensemble model.