학술논문

Automatic Generation of Annotated Corpora of Diagnoses with ICD-10 codes based on Open Data and Linked Open Data
Document Type
Conference
Source
2020 15th Conference on Computer Science and Information Systems (FedCSIS) Computer Science and Information Systems (FedCSIS), 2020 15th Conference on. :163-167 Sep, 2020
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Computer science
Codes
Open data
Information systems
Diseases
Language
Abstract
We propose methods for automatic generation of corpora that contains descriptions of diagnoses in Bulgarian and their associated codes in ICD-10-CM (International Classification of Diseases, 10th revision, Clinical Modification). The proposed approach is based on the available open data and Linked Open Data and can be easily adapted for other languages. The resulted corpora generated for the Bulgarian clinical texts consists of about 370,000 pairs of diagnoses and corresponding ICD-10 codes and is beyond the usual size that can be generated manually, moreover it was created from scratch and for a relatively short time. Further updates of the corpora are also possible whenever new open resources are available or the current ones are updated.