학술논문

Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation / 素性空間拡張法に基づくフレーズベース統計翻訳のマルチドメイン適応

Document Type

Journal Article

Author

Eiichiro Sumita; Kenji Imamura; 今村賢治; 隅田英一郎

Source

自然言語処理 / Journal of Natural Language Processing. 2017, 24(4):597

Subject

Corpus-concatenated Model
Domain Adaptation
Empty Value
Feature Augmentation
Phrase-based Statistical Machine Translation
empty 値
コーパス結合モデル
ドメイン適応
フレーズベース統計翻訳
素性空間拡張法

Language

Japanese

ISSN

1340-7619
2185-8314

Abstract

Domain adaptation is a major challenge when machine translation is applied to practical tasks. In this study, we present domain adaptation methods for machine translation that assume multiple domains. The proposed methods combine two typesof models: a corpus-concatenated model covering multiple domains and single-domain models that are accurate but sparse in specific domains. We combine the advantages of both the models using feature augmentation for domain adaptation in machine learning; however, a conventional method of feature augmentation for machine translation uses a single model. Our experimental results show that the translation qualities of the proposed method improved or were at the same level as those of the single-domain models. The proposed method is extremely effective in low-resource domains. Even in domains having a million bilingual sentences, the translation quality was at least preserved and even improved in some domains. These results demonstrate that state-of-the-art domain adaptations can be realized with appropriate model selection and appropriate settings, even when standard log-linear models are used.

Online Access

Open Access (JSTAGE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송