학술논문

Predictive Modeling of Lapses in Care for People Living with HIV in Chicago: Algorithm Development and Interpretation
Document Type
article
Source
JMIR Public Health and Surveillance, Vol 9, p e43017 (2023)
Subject
Public aspects of medicine
RA1-1270
Language
English
ISSN
2369-2960
Abstract
BackgroundReducing care lapses for people living with HIV is critical to ending the HIV epidemic and beneficial for their health. Predictive modeling can identify clinical factors associated with HIV care lapses. Previous studies have identified these factors within a single clinic or using a national network of clinics, but public health strategies to improve retention in care in the United States often occur within a regional jurisdiction (eg, a city or county). ObjectiveWe sought to build predictive models of HIV care lapses using a large, multisite, noncurated database of electronic health records (EHRs) in Chicago, Illinois. MethodsWe used 2011-2019 data from the Chicago Area Patient-Centered Outcomes Research Network (CAPriCORN), a database including multiple health systems, covering the majority of 23,580 people with an HIV diagnosis living in Chicago. CAPriCORN uses a hash-based data deduplication method to follow people across multiple Chicago health care systems with different EHRs, providing a unique citywide view of retention in HIV care. From the database, we used diagnosis codes, medications, laboratory tests, demographics, and encounter information to build predictive models. Our primary outcome was lapses in HIV care, defined as having more than 12 months between subsequent HIV care encounters. We built logistic regression, random forest, elastic net logistic regression, and XGBoost models using all variables and compared their performance to a baseline logistic regression model containing only demographics and retention history. ResultsWe included people living with HIV with at least 2 HIV care encounters in the database, yielding 16,930 people living with HIV with 191,492 encounters. All models outperformed the baseline logistic regression model, with the most improvement from the XGBoost model (area under the receiver operating characteristic curve 0.776, 95% CI 0.768-0.784 vs 0.674, 95% CI 0.664-0.683; P