학술논문
Audio Deepfake Detection Using Data Augmented Graph Frequency Cepstral Coefficients
Document Type
Conference
Author
Source
2023 International Conference on System, Computation, Automation and Networking (ICSCAN) System, Computation, Automation and Networking (ICSCAN), 2023 International Conference on. :1-6 Nov, 2023
Subject
Language
Abstract
Automatic speaker verification (ASV) systems serve an important role in identifying speakers in a variety of domains by enabling authentication, convenience, fraud detection, personalization, and forensic applications. The demand for ASV systems originates from how simple and effective speech biometrics may be. The growing popularity of such applications raises concerns about the growing possibility of speech attack. The purpose of this research is to identify audio spoofing attacks in an ASV system. The suggested model has a front -end and a back -end. The front -end has two features: Gammatone Cepstral Coefficients (GTCC) and Graph Frequency Cepstral Coefficients (GFCC) based on Spectrograms. Four machine learning models are utilised in the backend: Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and K-nearest Neighbour (KNN), as well as one deep learning model named Long Short -Term Memory (LSTM) and a transfer learning based pertained ResNet 50 model. The Logical Access (LA) partition of ASVspoof 2021 is used for training, whereas the Deepfakes (DF) portion of ASVspoof 2021 is used for testing. To address the issue of dataset imbalance, methods such as SpecAugment and Speed perturbation are used to extracted features, particularly GTCC features. For deep fake detection, the suggested model, which combines GFCC with pretrained ResNet50, obtains an outstanding Equal Error Rate (EER) of 1.78% and a tandem-Detection Cost Function (t-DCF) of 0.0458min.