학술논문

Generalizability of machine learning in predicting antimicrobial resistance in E. coli: a multi-country case study in Africa
Document Type
article
Source
BMC Genomics, Vol 25, Iss 1, Pp 1-13 (2024)
Subject
Antimicrobial resistance
E. coli
Machine learning
Africa
Whole-genome sequencing
Biotechnology
TP248.13-248.65
Genetics
QH426-470
Language
English
ISSN
1471-2164
Abstract
Abstract Background Antimicrobial resistance (AMR) remains a significant global health threat particularly impacting low- and middle-income countries (LMICs). These regions often grapple with limited healthcare resources and access to advanced diagnostic tools. Consequently, there is a pressing need for innovative approaches that can enhance AMR surveillance and management. Machine learning (ML) though underutilized in these settings, presents a promising avenue. This study leverages ML models trained on whole-genome sequencing data from England, where such data is more readily available, to predict AMR in E. coli, targeting key antibiotics such as ciprofloxacin, ampicillin, and cefotaxime. A crucial part of our work involved the validation of these models using an independent dataset from Africa, specifically from Uganda, Nigeria, and Tanzania, to ascertain their applicability and effectiveness in LMICs. Results Model performance varied across antibiotics. The Support Vector Machine excelled in predicting ciprofloxacin resistance (87% accuracy, F1 Score: 0.57), Light Gradient Boosting Machine for cefotaxime (92% accuracy, F1 Score: 0.42), and Gradient Boosting for ampicillin (58% accuracy, F1 Score: 0.66). In validation with data from Africa, Logistic Regression showed high accuracy for ampicillin (94%, F1 Score: 0.97), while Random Forest and Light Gradient Boosting Machine were effective for ciprofloxacin (50% accuracy, F1 Score: 0.56) and cefotaxime (45% accuracy, F1 Score:0.54), respectively. Key mutations associated with AMR were identified for these antibiotics. Conclusion As the threat of AMR continues to rise, the successful application of these models, particularly on genomic datasets from LMICs, signals a promising avenue for improving AMR prediction to support large AMR surveillance programs. This work thus not only expands our current understanding of the genetic underpinnings of AMR but also provides a robust methodological framework that can guide future research and applications in the fight against AMR.