학술논문

Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions

Document Type

Journal Article

Author

Ruiyu LIANG; Wenhao ZENG; Xiaoyan ZHAO; Yue XIE; Zhenlin LIANG

Source

IEICE Transactions on Information and Systems. 2023, E106.D(5):1098

Subject

feature enhancement
long short-term memory
multi-heads attention
speech emotion recognition

Language

English

ISSN

0916-8532
1745-1361

Abstract

To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

Online Access

Open Access (JSTAGE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송