e-Article

OpenFEAT: Improving Speaker Identification by Open-Set Few-Shot Embedding Adaptation with Transformer

Document Type

Conference

Author

Kishan, K C; Tan, Zhenning; Chen, Long; Jin, Minho; Han, Eunjung; Stolcke, Andreas; Lee, Chul

Source

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022 - 2022 IEEE International Conference on. :7062-7066 May, 2022

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Visualization
Error analysis
Signal processing algorithms
Signal processing
Transformers
Acoustics
speaker identification
embedding adaptation
few-shot open-set learning

Language

ISSN

2379-190X

Abstract

Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics. A common embedding space learned from a large number of speakers is not universally applicable for the optimal identification of every speaker in a household. In this work, we first formulate household speaker identification as a few-shot open-set recognition task and then propose a novel embedding adaptation framework to adapt speaker representations from the given universal embedding space to a household-specific embedding space using a set-to-set function, yielding better household speaker identification performance. With our algorithm, Open-set Few-shot Embedding Adaptation with Transformer (openFEAT), we observe that the speaker identification equal error rate (IEER) on simulated households with 2 to 7 hard-to-discriminate speakers is reduced by 23% to 31% relative.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

Send an e-mail