학술논문

A Graph and PhoBERT based Vietnamese Extractive and Abstractive Multi-Document Summarization Frame
Document Type
Conference
Source
2022 RIVF International Conference on Computing and Communication Technologies (RIVF) Computing and Communication Technologies (RIVF), 2022 RIVF International Conference on. :482-487 Dec, 2022
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Deep learning
Computational modeling
Pipelines
Computer architecture
Communications technology
Indexing
Language
Abstract
Although many methods of solving the Multi-Document Summarization (MDS) problem have been proposed, which belong to both extractive and abstractive summarization, models using only one of the two types of summarization still bring their own disadvantages. One of the good and potential approaches to the MDS problem is the combined approach of extractive and abstractive summarization. Currently, with many languages and especially Vietnamese, the studies that propose a combination of extractive and abstractive summarization are still very limited and have not been deeply exploited. In this paper, we propose a new MDS frame which contains two components in a pipeline architecture combining extractive and abstractive approaches for Vietnamese MDS. We use extractive approach in the first component to select the most important sentences in each document by constructing graphs with the edges representing sentences' relationship, nodes illustrating sentences of input documents. The selected sentences will be clustered to groups of sentences with similar meaning, then combined into documents corresponding to each group. The abstractive approach is used in second component, which uses the PhoBERT2PhoBERT model to generate final summary document. The results of the frame achieved a positive evaluation with the ROUGE-2 measure on two datasets ViMs and VN-MDS are 36.42 and 34.89 percent respectively.