학술논문

普通话多模态情感语音数据库构建与评测 / Construction and Evaluation of Mandarin Multimodal Emotional Speech Database
Document Type
Academic Journal
Source
复旦学报(自然科学版) / Journal of Fudan University(Natural Science). 63(1):18-31
Subject
情感语音数据库
多模态情感识别
维度情感空间
三维电磁发音仪
电子声门仪
emotional speech database
multimodal emotional recognition
dimensional emotional space
electromagnetic articulography
electroglottography
Language
Chinese
ISSN
0427-7104
Abstract
本文设计并建立了一个包含发音运动学、声学、声门和面部微表情的多模态情感语音汉语普通话数据库,分别从语料设计、被试选择、录制细节和数据处理等环节进行了详细的描述,其中信号被标记为离散情感标签(中性、愉悦、高兴、冷漠、愤怒、忧伤、悲痛)和维度情感标签(愉悦度、激活度、优势度).本文对维度标注的数据进行统计学分析,验证标注的有效性,同时验证标注者的 SCL-90 量表数据并与 PAD标注数据结合后进行分析,探究标注中存在的离群现象与标注者心理状况之间的内在联系.为验证该数据库的语音质量和情感区分度,本文使用 SVM、CNN、DNN3 种基础模型计算了 7 种情感的识别率.结果显示,单独使用声学数据时 7 种情感的平均识别率达到了 82.56%;单独使用声门数据时平均识别率达到了 72.51%;单独使用运动学数据时平均识别率也达到了 55.67%.因此,该数据库具有较高的质量,能够作为语音分析研究的重要来源,尤其是多模态情感语音分析的任务.
This paper designs and establishes a multimodal emotional speech Mandarin Chinese database including pronunciation kinematics,acoustics,glottis and facial micro-expressions,which is described in detail from the aspects of corpus design,participant selection,recording details and data processing,in which signals are marked as discrete emotional labels(neutral,pleasant,happy,apathetic,angry,sad,grief)and dimensional emotional labels(pleasure,activation,dominance).In this paper,the data labeled by dimension are statistically analyzed to verify the effectiveness of the annotation,and the outliers in the annotation are analyzed by combining the SCL-90 scale,and the SCL-90 scale data of the annotator is verified and analyzed in combination with the PAD annotated data,so as to explore the intrinsic relationship between the outlier phenomenon in the annotation and the psychological condition of the labeler.In order to verify the speech quality and emotion discrimination of the database,this paper uses three basic classification models of Support Vector Machine(SVM),Deep Neural Networks(DNN),Convolutional Neural Networks(CNN),to calculate the emotion recognition rate of these seven emotions categories.The results show that the average recognition rate of all seven emotions when using acoustic data alone reached 82.56%;the average recognition rate when using glottis data alone reached 72.51%;the average recognition rate when using the kinematics data also reached of 55.67%.Therefore,the database has high quality and can serve as an important source for the speech analysis research community,especially the task of multimodal emotional speech analysis.