학술논문

Visual World to an Audible Experience: Visual Assistance for the Blind And Visually Impaired
Document Type
Conference
Source
2020 IEEE 17th India Council International Conference (INDICON) India Council International Conference (INDICON), 2020 IEEE 17th. :1-6 Dec, 2020
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Visualization
Webcams
Blindness
Real-time systems
Integrated circuit modeling
Task analysis
Long short term memory
Deep Learning
Visual Question Answering
Image Captioning
Real Time
Language
ISSN
2325-9418
Abstract
This paper aims at assisting visually impaired people through Deep Learning (DL) by providing a system that can describe the surroundings as well as answer questions about the surroundings of the user. The system majorly consists of two models, an Image Captioning (IC) model, and a Visual Question Answering (VQA) model. The IC model is a Convolutional Neural Network and Recurrent Neural Network based architecture that incorporates a form of attention while captioning. This paper proposes two models, Multi-Layer Perceptron based and Long Short Term Memory (LSTM) based, for the VQA task that answer questions related to the input image. The IC model has achieved an average BLUE 1 score of 0.46. The LSTM based VQA model has given an overall accuracy of 47 percent. These two models are integrated along with Speech to Text and Text to Speech components to form a single system that works in real time.