학술논문
AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages
Document Type
Conference
Author
Adewumi, Tosin; Adeyemi, Mofetoluwa; Anuoluwapo, Aremu; Peters, Bukola; Buzaaba, Happy; Samuel, Oyerinde; Rufai, Amina Mardiyyah; Ajibade, Benjamin; Gwadabe, Tajudeen; Koulibaly Traore, Mory Moussou; Ajayi, Tunde Oluwaseyi; Muhammad, Shamsuddeen; Baruwa, Ahmed; Owoicho, Paul; Ogunremi, Tolulope; Ngigi, Phylis; Ahia, Orevaoghene; Nasir, Ruqayya; Liwicki, Foteini; Liwicki, Marcus
Source
2023 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2023 International Joint Conference on. :1-8 Jun, 2023
Subject
Language
ISSN
2161-4407
Abstract
Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.