학술논문

Identification of Scientific Texts with Similar Argumentation Complexity
Document Type
Conference
Source
2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON) Engineering, Computer and Information Sciences (SIBIRCON), 2022 IEEE International Multi-Conference on. :870-875 Nov, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Analytical models
Dictionaries
Annotations
Computational modeling
Clustering algorithms
Directed graphs
Linguistics
scientific texts
Walton’s argumentation schemes
argumentation graphs
argumentation patterns
functionally similar argumentation schemes
clustering methods
Language
Abstract
The presented work describes the study of formally identifying texts that are similar in argumentation complexity. We analyze scientific articles in Russian language through the use of clustering algorithms (K-means, Ward, Spectral). The clustering features include the formally calculable characteristics of argumentation annotations for the dataset texts, so the method is applicable to texts in different genres and languages (after adapting the markers dictionary). The principal limitation consists in the requirement of inputting quantitative characteristics of argumentation structures of texts, which are constructed in accordance with the Argument Interchange Format (in form of rooted directed graphs) and Walton’s compendium of argumentation schemes. We analyze the performance of the clustering algorithms on different feature sets, which characterize the general properties of argumentation graphs, the specific argumentation patterns (common subgraphs for different texts), emotionality and authoritativeness of texts. Argumentation patterns are represented in two forms: standard (in accordance with Walton’s compendium) and generalized (based on functional similarity). We check the similarity of clustering results by different algorithms through using several quality measures (Jaccard-index-based, V-measure, FM-score), whose values belong to the 64±71 percent range. The employed dataset contains more than 1000 arguments from argumentation annotations (graphs) for 30 scientific texts in two thematic areas (linguistics and information technologies). Argumentation graphs are constructed by two annotators with the ArgNetBankStudio tool. The resulting clusters are distinguished by the general complexity of argumentation graphs, the usage of specific argumentation patterns, as well as by the difference in emotionality and authoritativeness.