학술논문

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection.
Document Type
Article
Source
International Journal of Speech Technology. Mar2024, Vol. 27 Issue 1, p225-237. 13p.
Subject
*Deep learning
Convolutional neural networks
Feature extraction
Spectrograms
Hindi language
Language
ISSN
1381-2416
Abstract
With the increasing adoption of voice-based authentication systems, the threat of audio spoofing attacks has become a significant concern. These attacks aim to deceive voice authentication systems by manipulating or impersonating audio signals. To improve the audios security, we have introduced a spectrogram-based solution. Spectrograms, known for their effectiveness in audio analysis and feature extraction, offer valuable insights into combating audio spoofing. Our proposed model is divided into two parts that is frontend and backend. For implementing the frontend, our proposed model extensively investigates the utility of Mel Spectrogram, Gammatone Cepstral Coefficients Spectrogram (GTCC), Acoustic Ternary Pattern Spectrogram (ATP), and Mel-Frequency Cepstral Coefficients Spectrogram (MFCC). For backend implementation, two deep learning models that are Convolutional Neural Network (CNN) and Residual Network (ResNet50) have been leveraged individually with these four spectrograms. The effectiveness of the proposed system is validated through successful experimentation on the ASV Spoof 2019 Logical Access (LA), Physical Access (PA) evaluation datasets and our own Voice Impersonation Corpus in Hindi Language (VIHL) dataset. The outcome demonstrates that the proposed combination of GTCC spectrograms and ResNet50 outperforms all other proposed combinations by achieving Equal Error Rate (EER) of 0.6%, 1.15%, 4.3% for LA, PA and VIHL, respectively. [ABSTRACT FROM AUTHOR]