학술논문

Audio Deepfake Detection Using Data Augmented Graph Frequency Cepstral Coefficients
Document Type
Conference
Source
2023 International Conference on System, Computation, Automation and Networking (ICSCAN) System, Computation, Automation and Networking (ICSCAN), 2023 International Conference on. :1-6 Nov, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Support vector machines
Training
Deepfakes
Cepstral analysis
Computational modeling
Perturbation methods
Transfer learning
ASV
GTCC
EER
LA
GSP
ResNet50
LSTM
Language
Abstract
Automatic speaker verification (ASV) systems serve an important role in identifying speakers in a variety of domains by enabling authentication, convenience, fraud detection, personalization, and forensic applications. The demand for ASV systems originates from how simple and effective speech biometrics may be. The growing popularity of such applications raises concerns about the growing possibility of speech attack. The purpose of this research is to identify audio spoofing attacks in an ASV system. The suggested model has a front -end and a back -end. The front -end has two features: Gammatone Cepstral Coefficients (GTCC) and Graph Frequency Cepstral Coefficients (GFCC) based on Spectrograms. Four machine learning models are utilised in the backend: Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and K-nearest Neighbour (KNN), as well as one deep learning model named Long Short -Term Memory (LSTM) and a transfer learning based pertained ResNet 50 model. The Logical Access (LA) partition of ASVspoof 2021 is used for training, whereas the Deepfakes (DF) portion of ASVspoof 2021 is used for testing. To address the issue of dataset imbalance, methods such as SpecAugment and Speed perturbation are used to extracted features, particularly GTCC features. For deep fake detection, the suggested model, which combines GFCC with pretrained ResNet50, obtains an outstanding Equal Error Rate (EER) of 1.78% and a tandem-Detection Cost Function (t-DCF) of 0.0458min.