학술논문

Natural Language Processing–Assisted Classification Models to Confirm Monoclonal Gammopathy of Undetermined Significance and Progression in Veterans' Electronic Health Records.
Document Type
Article
Source
JCO Clinical Cancer Informatics. 11/4/2023, Vol. 7, p1-13. 13p.
Subject
*ELECTRONIC health records
*MACHINE learning
*NATURAL language processing
*NATURAL languages
*NOSOLOGY
*SUPPORT vector machines
*VETERANS' health
Language
ISSN
2473-4276
Abstract
PURPOSE: To develop and validate natural language processing (NLP)–assisted machine learning (ML)–based classification models to confirm diagnoses of monoclonal gammopathy of undetermined significance (MGUS) and multiple myeloma (MM) from electronic health records (EHRs) in the Veterans Health Administration (VHA). MATERIALS AND METHODS: We developed precompiled lexicons and classification rules as features for the following ML classifiers: logistic regression, random forest, and support vector machines (SVMs). These features were trained on 36,044 EHR documents from a random sample of 400 patients with at least one International Classification of Disease code for MGUS diagnosis from 1999 to 2021. The best-performing feature combination was calibrated in the validation set (17,826 documents/200 patients) and evaluated in the testing set (9,250 documents/100 patients). Model performance in diagnosis confirmation was compared with manual chart review results (gold standard) using recall, precision, accuracy, and F1 score. For patients correctly labeled as disease-positive, the difference between model-identified diagnosis dates and the gold standard was also computed. RESULTS: In the testing set, the NLP-assisted classification model using SVMs achieved best performance in both MGUS and MM confirmation with recall/precision/accuracy/F1 of 98.8%/93.3%/93.0%/96.0% for MGUS and 100.0%/92.3%/99.0%/96.0% for MM. Dates of diagnoses matched (±45 days) with those of gold standard in 73.0% of model-confirmed MGUS and 84.6% of model-confirmed MM. CONCLUSION: An NLP-assisted classification model can reliably confirm MGUS and MM diagnoses and dates and extract laboratory results using automated interpretation of EHR data. This algorithm has the potential to be adapted to other disease areas in VHA EHR system. A novel NLP-based machine learning model for using powerful VA EHR data to confirm MGUS and progression diagnoses. [ABSTRACT FROM AUTHOR]