학술논문

Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution
Document Type
article
Source
IEEE Access, Vol 12, Pp 39783-39803 (2024)
Subject
Authorship attribution
machine learning
natural language processing
semantic clustering
transfer learning
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Language
English
ISSN
2169-3536
Abstract
This paper is the fourth part of a research series that focuses on determining the authorship of Russian-language texts by analyzing short social media comments, including those from mass media and communities associated with destructive content. Semantic text clustering was used to analyze content and employed a transfer learning technique based on a pre-trained model to identify sensitive topics. Authorship attribution is implemented as a classical classification task with a closed set of authors and a more challenging open-set task. In the latter case, multiple experiments were conducted, incorporating the identification of destructive content with known authors and artificially generated texts. For open attribution, a method combining One-Class SVM and fastText was proposed. Results demonstrate high accuracy (92% or higher) for cases with 2 and 5 authors, regardless of comment length and the additional task of identifying authors of destructive text. Mixed-data experiments involving 10 or more authors yielded results comparable to or more accurate (84% or higher) than previous studies.