학술논문

A Unified Formal Framework for Factorial and Probabilistic Topic Modelling.
Document Type
Article
Source
Mathematics (2227-7390). Oct2023, Vol. 11 Issue 20, p4375. 27p.
Subject
*NATURAL language processing
*FACTORIALS
Language
ISSN
2227-7390
Abstract
Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions. [ABSTRACT FROM AUTHOR]