학술논문

P212: Development and Validation of Machine Learning and Deep Learning Models to Identify Human Monkeypox.
Document Type
Article
Source
Sexually Transmitted Diseases. 2024 Supplement, Vol. 51, pS228-S229. 2p.
Subject
Language
ISSN
0148-5717
Abstract
Background: In 2022, outbreaks of monkeypox virus (mpox) spread across the globe. In New York City (NYC), there were 3,829 cases. Early intervention is vital to provide supportive care and prevent further spread. Yet, in the absence of severe presentation, patients may go undiagnosed at initial visits. Efforts to identify mpox using machine learning (ML) and deep learning (DL) have largely focused on images. Clinical narratives in the electronic health record (EHR) also present a rich source of data and advances in natural language processing (NLP) often achieve state-of-the-art classifier performance. We sought to develop and validate ML/DL models that leverage clinical note text to identify mpox cases. Methods: We performed a retrospective study of mpox cases at Columbia University Medical Center (CUMC) in NYC between 5/2022 and 10/2022, during which CUMC maintained a list of patients with mpox diagnoses confirmed by polymerase chain reaction testing. For each case, we randomly selected 3 controls matched on age, administrative sex used for billing, race/ethnicity, care site, and visit month. We standardized EHR data using the Observational Medical Outcomes Partnership Common Data Model and obtained clinical notes from diagnosis date up to 30 days prior. We trained 3 mpox classifiers (Table 1) by applying LASSO regression, ClinicalBERT, or ClinicalLongformer to clinical note text. We used a 70%-30% train-test split and standard metrics for model evaluation. We also computed recall at precision of 80% to minimize false positives and alert fatigue. Results: We identified 228 patients with mpox (6% of NYC cases) and 698 as controls. Median age was 34; sex was male for 902 patients (97%). Our sample comprised 249 (27%) patients who identified as non- Hispanic Black, 117 (13%) as non-Hispanic white, and 316 (34%) as Hispanic/Latino; 244 (26%) were of unknown race/ethnicity. LASSO regression achieved the best performance. Phrases related to lesions, tenderness, exudate, HIV, and Streptococcus were among the most predictive LASSO features aside from mpox mentions. Conclusions: NLP and ML/DL show promise for enabling accurate, earlier diagnosis of mpox. While changes in clinical documentation (e.g., differential diagnoses) post-outbreak are a known limitation, use of classifiers may also prove effective in reducing missed diagnoses. [ABSTRACT FROM AUTHOR]