학술논문

JParaBank: A Large-Scale Sentence Pairs of Japanese Paraphrase via Machine Translation / JParaBank:機械翻訳に基づく大規模な日本語言い換え文対の収集
Document Type
Journal Article
Source
Proceedings of the Annual Conference of JSAI. 2023, :4
Subject
Data Augmentation
Mechine Translation
Paraphrase
データ拡張
機械翻訳
言い換え
Language
Japanese
ISSN
2758-7347
Abstract
To address the low-resource problem of machine learning tasks, including natural language processing, the effectiveness of data augmentation is well known. In recent natural language processing, data augmentation based on paraphrase generation has been used successfully in many applications. However, unlike English and Chinese, there is no large-scale corpus for training paraphrase generation models in Japanese, and thus data augmentation based on paraphrase generation is not available for Japanese. We release JParaBank, a large-scale Japanese paraphrase corpus of 21 million sentence pairs. Experimental results on the JGLUE benchmark show that data augmentation by paraphrase generation using JParaBank improves performance on many tasks.

Online Access