학술논문

Neural Machine Translation Using Multiple Back-translation Generated by Sampling / サンプリング生成に基づく複数逆翻訳を用いたニューラル機械翻訳
Document Type
Journal Article
Source
人工知能学会論文誌 / Transactions of the Japanese Society for Artificial Intelligence. 2020, 35(3):A-1
Subject
diversity
multiple back-translation
neural machine translation
sampling-based sequence generation
Language
Japanese
ISSN
1346-0714
1346-8030
Abstract
Alarge-scaleparallelcorpusisindispensabletotrainencoder-decoderneuralmachinetranslation. Themethod of using synthetic parallel texts, called back-translation, in which target monolingual sentences are automatically translated into the source language, has been proven effective in improving the decoder. However, it does not necessarily help enhance the encoder. In this paper, we propose a method that enhances not only the decoder but also the encoder using target monolingual corpora by generating multiple source sentences via sampling-based sequence generation. The source sentences generated in that way increase their diversity and thus help make the encoder robust. Ourexperimentalresultsshowthatthetranslationqualitywasimprovedbyincreasingthenumberofsynthetic source sentences for each given target sentence. Even though the quality did not reach to the one that realized with a genuine parallel corpus comprising single human translations, our proposed method derived over 50% of the improvementbroughtbytheparallelcorpususingonlyitstargetside, i.e., monolingualdata. Moreover,theproposed samplingmethodresultedinfinaltranslationofhigherqualitythann-bestback-translation. Theseresultsindicatethat not only the quality of back-translation but also the diversity of synthetic source sentences is crucial.