학술논문

Numerical reasoning reading comprehension on Vietnamese COVID-19 news: task, corpus, and challenges
Document Type
Original Paper
Source
Neural Computing and Applications. 36(23):14053-14073
Subject
Machine reading comprehension
Question answering
Numerical reasoning
Natural language understanding
Language
English
ISSN
0941-0643
1433-3058
Abstract
Numerical reasoning-based machine reading comprehension is a challenging task that involves language understanding with arithmetic operations such as addition, subtraction, comparison, and counting. Various studies on numeric-based reading comprehension have been conducted in English, but low-resource languages such as Vietnamese need to be considered more positively. The online COVID-19 news contains much numerical data and is the appropriate data source for this task. To overcome this problem, we propose COVIDROP, the first challenging Vietnamese machine reading comprehension corpus with numerical reasoning for online COVID-19 news articles. The corpus comprises 6594 human-generated question–answer pairs in 841 Vietnamese COVID-19 online news articles. Furthermore, we evaluated the performance of two numerical reasoning-based machine reading comprehension models, NAQANet and NumNet on COVIDROP. NAQANet performed best on the test set with 22.37% exact match (EM) and 26.58% F1. However, human performance (85.47%) is much higher, indicating that the corpus presents a good challenge for future research. Our corpus is available for evaluating numerical reasoning based on machine reading comprehension and question answering.