학술논문

Language-Model-based methods for Vietnamese Single-Document Extractive Summarization
Document Type
Conference
Source
2023 14th International Conference on Information and Communication Technology Convergence (ICTC) Information and Communication Technology Convergence (ICTC), 2023 14th International Conference on. :967-972 Oct, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fields, Waves and Electromagnetics
Power, Energy and Industry Applications
Signal Processing and Analysis
Transportation
Benchmark testing
Information and communication technology
Data mining
Task analysis
Convergence
extractive text summarization
pre-trained language model
single-document
Language
ISSN
2162-1241
Abstract
Latterly, there has been a dramatic increase in the amount of text data which demands effective summarization. This paper proposes a method of using English text summarization frameworks and Vietnamese pre-trained language models for Vietnamese single-document extractive summarization. The experiments were conducted with three frameworks namely BertSumExt, MatchSum, and CoLoExt, and two pre-trained language models namely PhoBERT and BartPho. Our models are evaluated on two well-known Vietnamese summarization benchmark datasets, namely Vietnews and Wikilingua, and achieved state-of-the-art results on Vietnews with a maximum ROUGE-1/2/L score are 57.15/26.23/39.76. The results on Wikilingua also show the effectiveness of our methods.