학술논문

Don’t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer

Document Type

Conference

Author

Chen, Sanyuan; Wu, Yu; Chen, Zhuo; Yoshioka, Takuya; Liu, Shujie; Li, Jinyu; Yu, Xiangzhan

Source

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021 - 2021 IEEE International Conference on. :6139-6143 Jun, 2021

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Adaptation models
Conferences
Signal processing
Acoustics
Acceleration
Task analysis
Speech processing
speech separation
multi-channel microphone
Transformer
deep learning

Language

ISSN

2379-190X

Abstract

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently. However, multi-channel speech separation sometimes does not necessarily need such a heavy structure for all time frames especially when the cross-talker challenge happens only occasionally. For example, in conversation scenarios, most regions contain only a single active speaker, where the separation task downgrades to a single speaker enhancement problem. It turns out that using a very deep network structure for dealing with signals with a low overlap ratio not only negatively affects the inference efficiency but also hurts the separation performance. To deal with this problem, we propose an early exit mechanism, which enables the Transformer model to handle different cases with adaptive depth. Experimental results indicate that not only does the early exit mechanism accelerate the inference, but it also improves the accuracy.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송