학술논문

A Dual-path Conformer-Based Network for Neural Speech Coding
Document Type
Conference
Source
2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP) Chinese Spoken Language Processing (ISCSLP), 2024 IEEE 14th International Symposium. :661-665 Nov, 2024
Subject
Computing and Processing
Signal Processing and Analysis
Training
Speech codecs
Time-frequency analysis
Speech coding
Vector quantization
Neural networks
Speech enhancement
Real-time systems
Thin film transistors
Spectrogram
Neural speech coding
conformer
Language
Abstract
In this paper, we propose a neural speech coding method based on the dual-path conformer, which mainly consists of three steps: (1) the encoding and decoding of the time-frequency spectrum are performed by a structure that combines the CNN and the dual-path conformer, (2) residual vector quantization is employed to quantize the output features of encoder and form a compact discrete representation, and (3) multi-period and multi-scale discriminators are used to improve the perceptual quality of speech during adversarial training. Experimental results, from both subjective and objective evaluations, demonstrate that the proposed codec outperforms the state-of-the-art neural codec AudioDEC and the leading conventional codec Opus in terms of performance.