학술논문
A Dual-path Conformer-Based Network for Neural Speech Coding
Document Type
Conference
Author
Source
2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP) Chinese Spoken Language Processing (ISCSLP), 2024 IEEE 14th International Symposium. :661-665 Nov, 2024
Subject
Language
Abstract
In this paper, we propose a neural speech coding method based on the dual-path conformer, which mainly consists of three steps: (1) the encoding and decoding of the time-frequency spectrum are performed by a structure that combines the CNN and the dual-path conformer, (2) residual vector quantization is employed to quantize the output features of encoder and form a compact discrete representation, and (3) multi-period and multi-scale discriminators are used to improve the perceptual quality of speech during adversarial training. Experimental results, from both subjective and objective evaluations, demonstrate that the proposed codec outperforms the state-of-the-art neural codec AudioDEC and the leading conventional codec Opus in terms of performance.