학술논문

Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments.

Document Type

Article

Author

Carry, Patrick M.; Vigers, Tim; Vanderlinden, Lauren A.; Keeter, Carson; Dong, Fran; Buckner, Teresa; Litkowski, Elizabeth; Yang, Ivana; Norris, Jill M.; Kechris, Katerina

Source

BMC Bioinformatics. 3/7/2023, Vol. 24 Issue 1, p1-18. 18p.

Subject

*BIOLOGICAL variation
*GENE expression
*NULL hypothesis
*ROOT-mean-squares
*ISLANDS of Langerhans

Language

ISSN

1471-2105

Abstract

Background: We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case–control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the 'true' dataset (CAPN13 gene). Results: Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. Conclusions: Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation. [ABSTRACT FROM AUTHOR]

Online Access

EBSCOHost PDF Full Text (ProQuest Central) Full Text (Gale Academic Onefile) Open Access (BioMed Central) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송