학술논문
Multimodal Sentiment Analysis: Techniques, Implementations and Challenges across Diverse Modalities
Document Type
Conference
Source
2024 11th International Conference on Computing for Sustainable Global Development (INDIACom) Computing for Sustainable Global Development (INDIACom), 2024 11th International Conference on. :405-413 Feb, 2024
Subject
Language
Abstract
Multimodal Sentiment Analysis has gained significant attention in Machine Learning. It’s popular because it not only provides better results and a deeper understanding of context but also offers a valuable alternative to analyzing just one type of data. In recent years, there has been a growing focus on this area, with the emergence of new datasets and advanced models designed to handle the complexities of analyzing emotions from different types of data. This paper presents an innovative approach for real-time emotion analysis using diverse inputs: text, images, videos, and audio. We begin by establishing baseline models and curating datasets for thorough evaluation. Our research introduces three cutting-edge deep learning techniques – Convolutional Neural Networks (CNN), Visual Geometry Group Networks (VGG), and Recurrent Neural Networks (RNN) – to enhance sentiment analysis accuracy. Additionally, we integrate YOLOv5 for sophisticated image processing, Mel Frequency Cepstral Coefficients (MFCC)-based frameworks for audio analysis, and advanced Natural Language Processing (NLP) models for text interpretation. Uniquely, we transform audio into text, allowing for dual-mode evaluation – as audio and text – using our NLP models. Our study critically examines factors often neglected in emotion analysis, like the impact of varying data sources and the system’s performance in different scenarios. The results of our research significantly deviate from existing methodologies by offering a more holistic and versatile framework. Our framework not only sets a standard for future research but also emphasizes the essential considerations needed for effective real-time emotion analysis across various modes.