학술논문

Speaker Turn Aware Similarity Scoring for Diarization of Speech-Based Cognitive Assessments

Document Type

Conference

Author

Xu, Sean Shensheng; Mak, Man-Wai; Wong, Ka Ho; Meng, Helen; Kwok, Timothy C.Y.

Source

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021 Asia-Pacific. :1299-1304 Dec, 2021

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Convolution
Training data
Information processing
Acoustic measurements
Data models
Acoustics
Microphones

Language

ISSN

2640-0103

Abstract

This paper proposes two enhancements to the con-ventional speaker diarization methods for speech-based Montreal cognitive assessments (MoCA). The enhancements address the technical challenges of MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-dependent at-tention are added to Res2Net blocks for multi-scale feature aggregation. Second, a sequence comparison approach with a holistic view of the whole conversation is applied to measure the similarity of short speech segments in the conversation, which results in a speaker-turn aware scoring matrix for the subsequent clustering step. Evaluations on an interactive dialog dataset for MoCA show that the proposed enhancements lead to a diarization system that outperforms the conventional x-vector/PLDA systems under language-, age-, and microphone mismatch scenarios. The results also show that the speaker-turn timestamps can be hypothesized, suggesting that the proposed enhancements are amendable to datasets without speaker timestamp information.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송