학술논문

Multiple imputation of incomplete multilevel data using Heckman selection models.
Document Type
Academic Journal
Author
Muñoz J; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.; Efthimiou O; Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland.; Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland.; Audigier V; Conservatoire national des arts et métiers (CNAM), Laboratoire CEDRIC-MSDMA, Paris, France.; de Jong VMT; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.; Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands.; Debray TPA; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.; Smart Data Analysis and Statistics, Utrecht, The Netherlands.
Source
Publisher: Wiley Country of Publication: England NLM ID: 8215016 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1097-0258 (Electronic) Linking ISSN: 02776715 NLM ISO Abbreviation: Stat Med Subsets: MEDLINE
Subject
Language
English
Abstract
Missing data is a common problem in medical research, and is commonly addressed using multiple imputation. Although traditional imputation methods allow for valid statistical inference when data are missing at random (MAR), their implementation is problematic when the presence of missingness depends on unobserved variables, that is, the data are missing not at random (MNAR). Unfortunately, this MNAR situation is rather common, in observational studies, registries and other sources of real-world data. While several imputation methods have been proposed for addressing individual studies when data are MNAR, their application and validity in large datasets with multilevel structure remains unclear. We therefore explored the consequence of MNAR data in hierarchical data in-depth, and proposed a novel multilevel imputation method for common missing patterns in clustered datasets. This method is based on the principles of Heckman selection models and adopts a two-stage meta-analysis approach to impute binary and continuous variables that may be outcomes or predictors and that are systematically or sporadically missing. After evaluating the proposed imputation model in simulated scenarios, we illustrate it use in a cross-sectional community survey to estimate the prevalence of malaria parasitemia in children aged 2-10 years in five regions in Uganda.
(© 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.)