학술논문

Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation / 素性空間拡張法に基づくフレーズベース統計翻訳のマルチドメイン適応
Document Type
Journal Article
Source
自然言語処理 / Journal of Natural Language Processing. 2017, 24(4):597
Subject
Corpus-concatenated Model
Domain Adaptation
Empty Value
Feature Augmentation
Phrase-based Statistical Machine Translation
empty 値
コーパス結合モデル
ドメイン適応
フレーズベース統計翻訳
素性空間拡張法
Language
Japanese
ISSN
1340-7619
2185-8314
Abstract
Domain adaptation is a major challenge when machine translation is applied to practical tasks. In this study, we present domain adaptation methods for machine translation that assume multiple domains. The proposed methods combine two typesof models: a corpus-concatenated model covering multiple domains and single-domain models that are accurate but sparse in specific domains. We combine the advantages of both the models using feature augmentation for domain adaptation in machine learning; however, a conventional method of feature augmentation for machine translation uses a single model. Our experimental results show that the translation qualities of the proposed method improved or were at the same level as those of the single-domain models. The proposed method is extremely effective in low-resource domains. Even in domains having a million bilingual sentences, the translation quality was at least preserved and even improved in some domains. These results demonstrate that state-of-the-art domain adaptations can be realized with appropriate model selection and appropriate settings, even when standard log-linear models are used.