학술논문
Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data
Document Type
Working Paper
Author
Brown, Biobele J.; Przybylski, Alexander A.; Manescu, Petru; Caccioli, Fabio; Oyinloye, Gbeminiyi; Elmi, Muna; Shaw, Michael J.; Pawar, Vijay; Claveau, Remy; Shawe-Taylor, John; Srinivasan, Mandayam A.; Afolabi, Nathaniel K.; Orimadegun, Adebola E.; Ajetunmobi, Wasiu A.; Akinkunmi, Francis; Kowobari, Olayinka; Osinusi, Kikelomo; Akinbami, Felix O.; Omokhodion, Samuel; Shokunbi, Wuraola A.; Lagunju, Ikeoluwa; Sodeinde, Olugbemiro; Fernandez-Reyes, Delmiro
Source
Subject
Language
Abstract
Plasmodium falciparum malaria still poses one of the greatest threats to human life with over 200 million cases globally leading to half-million deaths annually. Of these, 90% of cases and of the mortality occurs in sub-Saharan Africa, mostly among children. Although malaria prediction systems are central to the 2016-2030 malaria Global Technical Strategy, currently these are inadequate at capturing and estimating the burden of disease in highly endemic countries. We developed and validated a computational system that exploits the predictive power of current Machine Learning approaches on 22-years of prospective data from the high-transmission holoendemic malaria urban-densely-populated sub-Saharan West-Africa metropolis of Ibadan. Our dataset of >9x104 screened study participants attending our clinical and community services from 1996 to 2017 contains monthly prevalence, temporal, environmental and host features. Our Locality-specific Elastic-Net based Malaria Prediction System (LEMPS) achieves good generalization performance, both in magnitude and direction of the prediction, when tasked to predict monthly prevalence on previously unseen validation data (MAE<=6x10-2, MSE<=7x10-3) within a range of (+0.1 to -0.05) error-tolerance which is relevant and usable for aiding decision-support in a holoendemic setting. LEMPS is well-suited for malaria prediction, where there are multiple features which are correlated with one another, and trading-off between regularization-strength L1-norm and L2-norm allows the system to retain stability. Data-driven systems are critical for regionally-adaptable surveillance, management of control strategies and resource allocation across stretched healthcare systems.
Comment: 40 pages, 10 figures
Comment: 40 pages, 10 figures