학술논문

GRU-Enhanced Decoding by Lightweight Transformer for Image Captioning
Document Type
Conference
Source
2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence) Cloud Computing, Data Science & Engineering (Confluence), 2024 14th International Conference on. :407-410 Jan, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Visualization
Production
Transformers
Feature extraction
Encoding
Decoding
Machine translation
Language
ISSN
2766-421X
Abstract
Creating descriptive phrases with visual and textual data is known as image captioning. Transformers use an encoder and decoder configuration to manage language comprehension and machine translation. As part of our effort to create a small and lightweight model that can be deployed with ease, we present the Lightweight Transformer with an embedded GRU decoder for image captioning. We reduce the usual architecture in this model by reducing the number of encoders and decoders to only one encoder and a GRU-integrated decoder. Furthermore, including multilevel rich visual features from Inception V3 enhances the encoder's performance. We conducted a number of thorough experiments to assess the effectiveness of this suggested Lightweight Transformer architecture using the Viz Wiz Captions dataset.