학술논문

Phenotypic modelling of Crohn's disease severity : a machine learning approach
Document Type
Electronic Thesis or Dissertation
Source
Subject
616.3
Inflammatory bowel disease
Machine learning
Language
English
Abstract
The growing availability of complex healthcare data holds great promise for improvements in medicine. However, new methodological developments are necessary to realise the potential of these resources. In this thesis we focused upon phenotypic modelling for chronic disease applications, specifically inflammatory bowel disease (IBD). Patients with IBD experience varying clinical trajectories, with the disease course ranging greatly in terms of severity. Our goal was to develop methods to capture IBD patient severity, using data from the electronic health record, and to then use our representations of severity to discover subgroups of patients with similar characteristics. The establishment of patient subgroups and associated genomic factors can enable precision medicine approaches to improve patient care. Faced with the challenge of unevenly-sampled and sparse clinical time series data, we have proposed a novel approach founded in extreme value theory (EVT) as a means to convert these measurements into interpretable metrics of patient abnormality. We show that our metrics are specifically useful in the modelling of IBD patients, as they provide a condensed representation of a patient's aggregate biochemistry and haematology dynamics. We also found that patient biomarker-based severity is much more representative of classical severity metrics for patients with ulcerative colitis (UC, one of the two primary IBD subtypes) than it is for patients with Crohn's disease (CD, the second subtype). Thus motivated to combine both our EVT-based and classical phenotypic representations of severity, we implemented a Bayesian clustering model so as to identify latent patient severity profiles. Our model is capable of handling missing data and inferring the number of underlying clusters. We found that consistent patient sub-groups were identifiable in our patient cohort, with the majority of CD patients falling into subgroups with severe phenotypic behaviour and the majority of UC patients exhibiting less severe behaviour. Having identified patient subgroups, we performed a hypothesis-generating association analysis to relate these subgroups (and other clinical features) to genetic loci previously associated with IBD susceptibility. We have presented a number of nominal associations, several of which have plausible biologic mechanisms. Finally, we have illustrated how we can use traditional approaches, machine learning techniques, and our presented EVT methods to answer a practical clinical question regarding the efficacy of two important IBD last-line medications. We examined these drugs in terms of their relative effectiveness, our ability to predict of patient response, and characterisation of this response. We were able to identify distinct ways in which patients respond to these drugs, while also finding that our retrospective data reveals no apparent difference between the two drugs in their effectiveness. In summary, we have developed a set of methods that can be applied to the challenging problem of finding patterns across patients with heterogeneous disease. Upon using these methods within our specific IBD application area, we have obtained clinically and scientifically useful results.

Online Access