학술논문

Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates.
Document Type
Article
Source
PLoS Computational Biology. 4/16/2024, Vol. 20 Issue 4, p1-32. 32p.
Subject
*MISSING data (Statistics)
*INFECTIOUS disease transmission
*DISEASE outbreaks
*SYMPTOMS
*BASIC reproduction number
*COMMUNICABLE diseases
Language
ISSN
1553-734X
Abstract
The time-varying effective reproduction number Rt is a widely used indicator of transmission dynamics during infectious disease outbreaks. Timely estimates of Rt can be obtained from reported cases counted by their date of symptom onset, which is generally closer to the time of infection than the date of report. Case counts by date of symptom onset are typically obtained from line list data, however these data can have missing information and are subject to right truncation. Previous methods have addressed these problems independently by first imputing missing onset dates, then adjusting truncated case counts, and finally estimating the effective reproduction number. This stepwise approach makes it difficult to propagate uncertainty and can introduce subtle biases during real-time estimation due to the continued impact of assumptions made in previous steps. In this work, we integrate imputation, truncation adjustment, and Rt estimation into a single generative Bayesian model, allowing direct joint inference of case counts and Rt from line list data with missing symptom onset dates. We then use this framework to compare the performance of nowcasting approaches with different stepwise and generative components on synthetic line list data for multiple outbreak scenarios and across different epidemic phases. We find that under reporting delays realistic for hospitalization data (50% of reports delayed by more than a week), intermediate smoothing, as is common practice in stepwise approaches, can bias nowcasts of case counts and Rt, which is avoided in a joint generative approach due to shared regularization of all model components. On incomplete line list data, a fully generative approach enables the quantification of uncertainty due to missing onset dates without the need for an initial multiple imputation step. In a real-world comparison using hospitalization line list data from the COVID-19 pandemic in Switzerland, we observe the same qualitative differences between approaches. The generative modeling components developed in this work have been integrated and further extended in the R package epinowcast, providing a flexible and interpretable tool for real-time surveillance. Author summary: During an infectious disease outbreak, public health authorities require timely indicators of transmission dynamics, such as the effective reproduction number Rt. Since reporting data are delayed and often incomplete, statistical methods must be employed to obtain real-time estimates of case numbers and Rt. Existing methods involve separate steps for imputing missing data, adjusting for reporting delays, and estimating Rt. This stepwise approach impedes uncertainty quantification and can lead to inconsistent smoothing assumptions across steps. In this paper, we propose an alternative approach based on generative Bayesian modeling which integrates all steps into a single nowcasting model that can be directly fit to observed data. Using synthetic and real-world line list data, we demonstrate that the generative approach better captures uncertainty and avoids bias from inconsistent assumptions. The model components of our approach have been integrated into the R package epinowcast for easy use in practice. [ABSTRACT FROM AUTHOR]