학술논문

Investigating the Performance Impact of Dimensionality Reduction on Word Vectors
Document Type
Conference
Source
2023 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) IIAI-AAI Advanced Applied Informatics (IIAI-AAI), 2023 14th IIAI International Congress on. :319-324 Jul, 2023
Subject
Computing and Processing
Dimensionality reduction
Training
Sentiment analysis
Costs
Semantics
Syntactics
Benchmark testing
word embedding
dimensionality reduction
word similarity
word analogy
Language
Abstract
Word embedding has been essential in advancing state-of-the-art benchmarks in many natural language processing tasks. In training such word embeddings, the dimension of the vectors is an important hyperparameter that determines the cost in terms of training time and storage. Because it can be costly to retrain and find a good balance between storage volume and performance, practitioners can be interested in using dimensionality reduction algorithm such as PCA to obtain smaller-sized vectors for smaller computational and storage costs. This experimental study explores how far PCA can be used in reducing the dimension of word vectors without sacrificing too much performance. Our findings suggest that PCA can be used to reduce word vectors by up to half their size, within a range of frequently used dimensions for word vectors, while keeping reasonable performances on downstream tasks, but at the cost of possibly missing optimal performance from retraining.