학술논문

Multimodal Sentiment Analysis: Techniques, Implementations and Challenges across Diverse Modalities
Document Type
Conference
Source
2024 11th International Conference on Computing for Sustainable Global Development (INDIACom) Computing for Sustainable Global Development (INDIACom), 2024 11th International Conference on. :405-413 Feb, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Geoscience
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
YOLO
Analytical models
Sentiment analysis
Visualization
Recurrent neural networks
Computational modeling
Transforms
Real-time
Multimodal Sentiment Analysis
Text
Image
Video
Audio
Deep Learning Architectures
YOLOv5
MFCC-based Models
NLP Models
Benchmark
Dataset Division
Generalizability
Modality Significance
Emotion Analysis
Framework
Evaluation
Innovation
Language
Abstract
Multimodal Sentiment Analysis has gained significant attention in Machine Learning. It’s popular because it not only provides better results and a deeper understanding of context but also offers a valuable alternative to analyzing just one type of data. In recent years, there has been a growing focus on this area, with the emergence of new datasets and advanced models designed to handle the complexities of analyzing emotions from different types of data. This paper presents an innovative approach for real-time emotion analysis using diverse inputs: text, images, videos, and audio. We begin by establishing baseline models and curating datasets for thorough evaluation. Our research introduces three cutting-edge deep learning techniques – Convolutional Neural Networks (CNN), Visual Geometry Group Networks (VGG), and Recurrent Neural Networks (RNN) – to enhance sentiment analysis accuracy. Additionally, we integrate YOLOv5 for sophisticated image processing, Mel Frequency Cepstral Coefficients (MFCC)-based frameworks for audio analysis, and advanced Natural Language Processing (NLP) models for text interpretation. Uniquely, we transform audio into text, allowing for dual-mode evaluation – as audio and text – using our NLP models. Our study critically examines factors often neglected in emotion analysis, like the impact of varying data sources and the system’s performance in different scenarios. The results of our research significantly deviate from existing methodologies by offering a more holistic and versatile framework. Our framework not only sets a standard for future research but also emphasizes the essential considerations needed for effective real-time emotion analysis across various modes.