학술논문

Enhancing MBTI Personality Trait Prediction from Imbalanced Social Media Data Using Hybrid Query Expansion Ranking and Glo Ve- BiLSTM
Document Type
Conference
Source
2023 IEEE International Conference on Fuzzy Systems (FUZZ) Fuzzy Systems (FUZZ), 2023 IEEE International Conference on. :1-6 Aug, 2023
Subject
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Social networking (online)
Computational modeling
Machine learning
Predictive models
Media
Feature extraction
Data models
personality prediction
social media data
query expansion ranking
global vectors for word embedding
bidirectional long short-term memory
Language
ISSN
1558-4739
Abstract
The usefulness of information obtained from social media data using machine learning methods is increasingly widespread, including predicting a person's personality. One of the personality type theories that is often used today in describing a person's personality is the Myers-Briggs Type Indicator. The challenges faced in processing text data from social media by machine learning methods are the imbalanced data for each personality type and the high dimensional features extracted from the data. Handling the problem of imbalanced data with oversampling techniques will increase the high dimension of features, which has an impact on increasing computation time. On the other hand, reducing feature dimensions will affect the quality of the prediction results because the machine learning process requires an adequate amount of data. This study develops a hybrid QER and GloVe-BiLSTM model by combining the Bidi-rectional Long Short-Term Memory (BiLSTM) classifier layer with the Global Vectors for Word Representation (GloVe) and Query Expansion Ranking(QER) as an input layer. The model works on data that has previously gone through a balancing process using the Synthetic Minority Oversampling Technique (SMOTE). The experimental findings show that the proposed model can, in fact, significantly enhance personality prediction performance in terms of prediction accuracy and computation time.