학술논문

Improving Endoscopic Image Analysis: Attention Mechanism Integration in Grid Search Fine-Tuned Transfer Learning Model for Multi-Class Gastrointestinal Disease Classification
Document Type
Periodical
Source
IEEE Access Access, IEEE. 12:80345-80358 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Computational modeling
Gastrointestinal tract
Endoscopes
Cancer
Transfer learning
Optimization
Residual neural networks
DenseNet121
endoscopy
gastrointestinal diseases
grid search
hyperparameter optimization
InceptionResnetV2
InceptionV3
MobileNetV2
ResNet50
ResNet101
Xception
attention mechanism
Language
ISSN
2169-3536
Abstract
Due to a continuous change in people’s lifestyle and dietary habits, gastrointestinal diseases are on the increase, with dietary changes being a major contributor to a variety of bowel problems. Around two million people around the world die due to gastrointestinal (GI) diseases. Endoscopy is a medical imaging technology helpful in diagnosing gastrointestinal diseases like polyps and esophagitis. Its manual diagnosis is time-consuming; hence, computer-aided techniques are now widely used for accurate and fast GI disease diagnosis. In this paper, the Kvasir dataset of 4000 endoscopic images, comprising 500 images of each of the eight gastrointestinal tract disease classes have been classified using seven grid search fine-tuned transfer learning models. The fine-tuned transfer learning models employed in this paper are ResNet101, InceptionV3, InceptionResNetV2, Xception, DenseNet121, MobileNetV2, and ResNet50. The grid search algorithm has been used to determine the architectural and fine-tuning hyperparameters. The fine-tuned ResNet101 model performed the best, with a learning rate 0.001 and a batch size of 32 for the SGD optimizer at 40 epochs. These hyperparameters were optimized through grid search along with new set of layers added to the model. The newly added layers include one flatten layer, two dropout layers and five dense layers optimized using grid search. The grid search fine-tuned ResNet101 model obtained an accuracy of 0.90, a precision of 0.92, a recall of 0.92, and an f1-score of 0.91. Further, the grid search fine-tuned ResNet101 model was integrated with an attention mechanism to enhance performance by focusing on essential image features, notably in medical imaging where some regions may contain vital diagnostic information. The proposed grid search fine-tuned and attention mechanism integrated ResNet101 model achieved an accuracy of 0.935, precision of 0.93, recall of 0.94 and an f1-score of 0.93.