학술논문

Host genetics and COVID-19 severity: increasing the accuracy of latest severity scores by Boolean quantum features
Document Type
article
Author
Gabriele MartelloniAlessio TurchiChiara FalleriniAndrea Degl’InnocentiMargherita BaldassarriSimona OlmiSimone FuriniAlessandra RenieriGEN-COVID Multicenter studyFrancesca MariSergio DagaIlaria MeloniMirella BruttiniSusanna CrociMirjam ListaDebora MaffeoElena PasquinelliGiulia BrunelliKristina ZguroViola Bianca SerioEnrica AntoliniSimona Letizia BassoSamantha MinettoGiulia RolloMartina RozzaAngela RinaRossella TitaMaria Antonietta MencarelliCaterina Lo RizzoAnna Maria PintoFrancesca ArianiFrancesca MontagnaniMario TumbarelloIlaria RancanMassimiliano FabbianiElena BargagliLaura BergantiniMiriana d’AlessandroPaolo CameliDavid BennettFederico AneddaSimona MarcantonioSabino ScollettaFederico FranchiMaria Antonietta MazzeiSusanna GuerriniEdoardo ConticiniLuca CantariniBruno FredianiDanilo TacconiChiara Spertilli RaffaelliArianna EmiliozziMarco FeriAlice DonatiRaffaele ScalaLuca GuidelliGenni SpargiMarta CorridiCesira NencioniLeonardo CrociGian Piero CaldarelliDavide RomaniPaolo PiacentiniMaria BandiniElena DesanctisSilvia CappelliAnna CanacciniAgnese VerzuriValentina AnemoliManola PisaniAgostino OgnibeneMaria LorubbioAlessandro PancrazziMassimo VaghiAntonella D’Arminio MonforteFederica Gaia MiragliaMario U. MondelliStefania MantovaniRaffaele BrunoMarco VecchiaMarcello MaffezzoniEnrico MartinelliMassimo GirardisStefano BusaniSophie VenturelliAndrea CossarizzaAndrea AntinoriAlessandra VergoriStefano RusconiMatteo SianoArianna GabrieliAgostino RivaDaniela FrancisciElisabetta SchiaroliCarlo PallottoSaverio Giuseppe ParisiMonica BassoSandro PaneseStefano BarattiPier Giorgio ScottonFrancesca AndrettaMario GiobbiaRenzo ScaggianteFrancesca GattiFrancesco CastelliEugenia Quiros-RoldanMelania Degli AntoniIsabella ZanellaMatteo della MonicaCarmelo PiscopoMario CapassoRoberta RussoImmacolata AndolfoAchille IolasconGiuseppe FiorentinoMassimo CarellaMarco CastoriGiuseppe MerlaGabriella Maria SqueoFilippo AucellaPamela RaggiRita PernaMatteo BassettiAntonio Di BiagioMaurizio SanguinettiLuca MasucciAlessandra GuarnacciaSerafina ValenteAlex Di FlorioMarco MandalàAlessia GiorliLorenzo SalerniPatrizia ZucchiPierpaolo ParraviciniElisabetta MenattiTullio TrottaFerdinando GiannattasioGabriella CoiroFabio LenaGianluca LacerenzaCristina MussiniLuisa TavecchiaLia CrottiGianfranco ParatiRoberto MenèMaurizio SanaricoMarco GoriFrancesco RaimondiAlessandra StellaFilippo BiscariniTiziana BachettiMaria Teresa La RovereMaurizio BussottiSerena LudovisiKatia CapitaniSimona DeiSabrina RavagliaAnnarita GilibertiGiulia GoriRosangela ArtusoElena AndreucciAngelica PagliazziErika FiorentiniAntonio PerrellaFrancesco BianchiPaola BergomiEmanuele CatenaRiccardo ColomboSauro LuchiGiovanna MorelliPaola PetrocelliSarah IacopiniSara ModicaSilvia BaroniGiulia MicheliMarco FalconeDonato UrsoGiusy TiseoTommaso MatucciDavide GrassiClaudio FerriFranco MarinangeliFrancesco BrancatiAntonella VincentiValentina BorgoStefania LombardiMirco LenziMassimo Antonio Di PietroFrancesca VichiBenedetta RomaninLetizia AttalaCecilia CostaAndrea GabbutiAlessio BellucciMarta ColaneriPatrizia CaspriniCristoforo PomaraMassimiliano EspositoRoberto LeonciniMichele CirianniLucrezia GalassoMarco Antonio BelliniChiara GabbiNicola Picchiotti
Source
Frontiers in Genetics, Vol 15 (2024)
Subject
COVID-19
host genetics
integrated polygenic score
genetic algorithm
logistic regression
genetic science modeling
Genetics
QH426-470
Language
English
ISSN
1664-8021
Abstract
The impact of common and rare variants in COVID-19 host genetics has been widely studied. In particular, in Fallerini et al. (Human genetics, 2022, 141, 147–173), common and rare variants were used to define an interpretable machine learning model for predicting COVID-19 severity. First, variants were converted into sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. After that, the Boolean features, selected by these logistic models, were combined into an Integrated PolyGenic Score (IPGS), which offers a very simple description of the contribution of host genetics in COVID-19 severity.. IPGS leads to an accuracy of 55%–60% on different cohorts, and, after a logistic regression with both IPGS and age as inputs, it leads to an accuracy of 75%. The goal of this paper is to improve the previous results, using not only the most informative Boolean features with respect to the genetic bases of severity but also the information on host organs involved in the disease. In this study, we generalize the IPGS adding a statistical weight for each organ, through the transformation of Boolean features into “Boolean quantum features,” inspired by quantum mechanics. The organ coefficients were set via the application of the genetic algorithm PyGAD, and, after that, we defined two new integrated polygenic scores (IPGSph1 and IPGSph2). By applying a logistic regression with both IPGS, (IPGSph2 (or indifferently IPGSph1) and age as inputs, we reached an accuracy of 84%–86%, thus improving the results previously shown in Fallerini et al. (Human genetics, 2022, 141, 147–173) by a factor of 10%.