학술논문

Ensemble Combination between Different Time Segmentations

Document Type

Conference

Author

Wong, Jeremy H. M.; Dimitriadis, Dimitrios; Kumatani, Kenichi; Gaur, Yashesh; Polovets, George; Parthasarathy, Partha; Sun, Eric; Li, Jinyu; Gong, Yifan

Source

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021 - 2021 IEEE International Conference on. :6768-6772 Jun, 2021

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Computational modeling
Conferences
Neural networks
Speech recognition
Signal processing
Acoustics
Task analysis
Combination
segmentation
end-to-end
speech recognition
meeting transcription

Language

ISSN

2379-190X

Abstract

Hypothesis-level combination between multiple models can often yield gains in speech recognition. However, all models in the ensemble are usually restricted to use the same audio segmentation times. This paper proposes to generalise hypothesis-level combination, allowing the use of different audio segmentation times between the models, by splitting and re-joining the hypothesised N-best lists in time. A hypothesis tree method is also proposed to distribute hypothesis posteriors among the constituent words, to facilitate such splitting when per-word scores are not available. The approach is assessed on a Microsoft meeting transcription task, by performing combination between a streaming first-pass recognition and an offline second-pass recognition. The experimental results show that the proposed approach can yield gains when combining over different segmentation times. Furthermore, the results also show that a combination between a hybrid model and an end-to-end neural network model yields a greater improvement than a combination between two hybrid models.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송