학술논문

New approach to discover meaningful terms to specify cause of death from narratives verbal autopsy using TF-IDF and the LDA topic model
Document Type
Conference
Source
IEEE EUROCON 2023 - 20th International Conference on Smart Technologies Smart Technologies, IEEE EUROCON 2023 - 20th International Conference on. :502-507 Jul, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Geoscience
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Measurement
Databases
Autopsy
Sociology
Medical services
Reliability
Data mining
LDA
Short Text
Verbal Autopsy
TF-IDF
CoARTEX
Language
Abstract
Due to a lack of coroners in some remote areas of the world, epidemiological researchers have created a database for collecting causes of death, called a verbal autopsy. The unstructured verbal autopsy (VA) narratives that are collected in this database are full of hidden knowledge about mortality. However, they are under-exploited due to inadequate processing mechanisms, or some of the computational techniques used are inappropriate for the data format. In this paper, we propose an unsupervised approach that is essentially based on a new algorithm for preprocessing such data. This is not only to address the challenge of topic extraction with the Latent Dirichlet Allocation (LDA) topic model in the context of data scarcity, but also to improve the exploitation of topics (causes of death). Experiments with the Population Health Metrics Research Consortium (PHMRC) data have demonstrated the validity of the approach and have led to the identification of reliable causes of death as well as the discovery of new ones.