학술논문

Potential and limitations of machine meta-learning (ensemble) methods for predicting COVID-19 mortality in a large inhospital Brazilian dataset
Document Type
Original Paper
Author
de Paiva, Bruno Barbosa MirandaPereira, Polianna Delfinode Andrade, Claudio Moisés ValienseGomes, Virginia Mara ReisSouza-Silva, Maira Viana RegoMartins, Karina Paula Medeiros PradoSales, Thaís Lorenna Souzade Carvalho, Rafael Lima RodriguesPires, Magda CarvalhoRamos, Lucas Emanuel FerreiraSilva, Rafael Tavaresde Freitas Martins Vieira, AlessandraNunes, Aline Gabrielle Sousade Oliveira Jorge, Alzirade Oliveira Maurílio, AmandaScotton, Ana Luiza Bahia Alvesda Silva, Carla Thais Candida AlvesCimini, Christiane Corrêa RodriguesPonce, DanielaPereira, Elayne CrestaniManenti, Euler Roberto FernandesRodrigues, Fernanda d’AthaydeAnschau, FernandoBotoni, Fernando AntônioBartolazzi, FredericoGrizende, Genna Maira SantosNoal, Helena CarolinaDuani, HelenaGomes, Isabela MoraesCosta, Jamille Hemétrio Salles Martinsdi Sabatino Santos Guimarães, JúliaTupinambás, Julia TeixeiraRugolo, Juliana MachadoBatista, Joanna d’Arc Lyrade Alvarenga, Joice CoutinhoChatkin, José MiguelRuschel, Karen BrasilZandoná, Liege BarellaPinheiro, Lílian SantosMenezes, Luanna Silva Monteirode Oliveira, Lucas Moyses CarvalhoKopittke, LucianeAssis, Luisa ArgoloMarques, Luiza MargotoRaposo, Magda CesarFloriani, Maiara AnschauBicalho, Maria Aparecida CamargosNogueira, Matheus Carvalho Alvesde Oliveira, Neimy RamosZiegelmann, Patricia KlarmannParaiso, Pedro Gibsonde Lima Martelli, Petrônio JoséSenger, RobertaMenezes, Rochele MosmannFrancisco, Saionara CristinaAraújo, Silvia FerreiraKurtz, TatianaFereguetti, Tatiani Oliveirade Oliveira, Thainara ConceiçãoRibeiro, Yara Cristina Neves Marques BarbosaRamires, Yuri CarlottoLima, Maria Clara Pontello BarbosaCarneiro, MarceloBezerra, Adriana Falangola BenjaminSchwarzbold, Alexandre Vargasde Moura Costa, André SoaresFarace, Barbara LopesSilveira, Daniel Vitoriode Almeida Cenci, Evelin PaolaLucas, Fernanda BarbosaAranha, Fernando GraçaBastos, Gisele Alsina NaderVietta, Giovanna GrunewaldNascimento, Guilherme FagundesVianna, Heloisa ReniersGuimarães, Henrique Cerqueirade Morais, Julia Drumond ParreirasMoreira, Leila Beltramide Oliveira, Leonardo Seixasde Deus Sousa, Lucasde Souza Viana, Lucianode Souza Cabral, Máderson AlvaresFerreira, Maria Angélica Piresde Godoy, Mariana Frizzode Figueiredo, Meire PereiraGuimarães-Junior, Milton Henriquesde Paula de Sordi, Mônica Aparecidada Cunha Severino Sampaio, NatáliaAssaf, Pedro LedicLutkmeier, RaquelValacio, Reginaldo AparecidoFinger, Renan Goulartde Freitas, RufinoGuimarães, Silvana Mangeon MeirellesOliveira, Talita FischerDiniz, Thulio Henrique OliveiraGonçalves, Marcos AndréMarcolino, Milena Soriano
Source
Scientific Reports. 13(1)
Subject
Language
English
ISSN
2045-2322
Abstract
The majority of early prediction scores and methods to predict COVID-19 mortality are bound by methodological flaws and technological limitations (e.g., the use of a single prediction model). Our aim is to provide a thorough comparative study that tackles those methodological issues, considering multiple techniques to build mortality prediction models, including modern machine learning (neural) algorithms and traditional statistical techniques, as well as meta-learning (ensemble) approaches. This study used a dataset from a multicenter cohort of 10,897 adult Brazilian COVID-19 patients, admitted from March/2020 to November/2021, including patients [median age 60 (interquartile range 48–71), 46% women]. We also proposed new original population-based meta-features that have not been devised in the literature. Stacking has shown to achieve the best results reported in the literature for the death prediction task, improving over previous state-of-the-art by more than 46% in Recall for predicting death, with AUROC 0.826 and MacroF1 of 65.4%. The newly proposed meta-features were highly discriminative of death, but fell short in producing large improvements in final prediction performance, demonstrating that we are possibly on the limits of the prediction capabilities that can be achieved with the current set of ML techniques and (meta-)features. Finally, we investigated how the trained models perform on different hospitals, showing that there are indeed large differences in classifier performance between different hospitals, further making the case that errors are produced by factors that cannot be modeled with the current predictors.