학술논문

Tuning the performance of automatic speaker recognition in different conditions: effects of language and simulated voice disguise.
Document Type
Article
Source
International Journal of Speech, Language & the Law. 2019, Vol. 26 Issue 2, p209-229. 21p.
Subject
*Automatic speech recognition
*Speech processing systems
*Disguise
*Human voice
Electronic voice alteration
Voiceprints
Language
ISSN
1748-8885
Abstract
Automatic speaker recognition applications have often been described as a 'black box'. This study explores the benefit of tuning procedures (condition adaptation and reference normalisation) implemented in an i-vector PLDA framework ASR system, VOCALISE. These procedures enable users to open the black box to a certain degree. Subsets of two 100-speaker databases, one of Czech and the other of Persian male speakers, are used for the baseline condition and for the tuning procedures. The effect of tuning with cross-language material, as well as the effect of simulated voice disguise, achieved by raising the fundamental frequency by four semitones and resonance characteristics by 8%, are also examined. The results show superior recognition performance (EER) for Persian than Czech in the baseline condition, but an opposite result in the simulated disguise condition; possible reasons for this are discussed. Overall, the study suggests that both condition adaptation and reference normalisation are beneficial to recognition performance. [ABSTRACT FROM AUTHOR]