
PaperMill Detection in Scientific Content
Document Type
2023 18th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)18th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP 2023) Semantic and Social Media Adaptation & Personalization (SMAP), 2023 18th International Workshop on. :1-6 Sep, 2023
Communication, Networking and Broadcast Technologies
Computing and Processing
Paper mills
Artificial intelligence
paper mill
artificial intelligence generated text
research integrity
artificial intelligence
Researchers often come under pressure when facing the ever-increasing demand to produce a progressive number of publications, resorting to hiring the services of paper mills. These are unofficial, and often illegitimate, organizations providing ready-made questionable research components and services, posing a threat to the research integrity, scientific ecosystem, and publishers. Identifying paper mill material is a challenging and laborious process, while the increasing number of Artificial Intelligence services generating human-like text obstructs this process. The purpose of this paper is to contribute to the research integrity domain by proposing the PaperMill Detection manuscript screening framework. By leveraging contextual signals, it measures the probability of a document being the result of a paper mill organization or generated by Artificial Intelligence. The combination of these signals can facilitate the detection of questionable scientific content. Our evaluation has revealed that the proposed approach outperforms other open-source and commercial solutions in all examined evaluation metrics, achieving an F1 score of 0.97.