학술논문

Patient contrastive learning: A performant, expressive, and practical approach to electrocardiogram modeling.
Document Type
Article
Source
PLoS Computational Biology. 2/14/2022, Vol. 18 Issue 2, p1-16. 16p. 1 Diagram, 3 Charts, 5 Graphs.
Subject
*DEEP learning
*SUPERVISED learning
*ARTIFICIAL neural networks
*LEFT ventricular hypertrophy
*ATRIAL fibrillation
*MACHINE learning
Language
ISSN
1553-734X
Abstract
Supervised machine learning applications in health care are often limited due to a scarcity of labeled training data. To mitigate the effect of small sample size, we introduce a pre-training approach, Patient Contrastive Learning of Representations (PCLR), which creates latent representations of electrocardiograms (ECGs) from a large number of unlabeled examples using contrastive learning. The resulting representations are expressive, performant, and practical across a wide spectrum of clinical tasks. We develop PCLR using a large health care system with over 3.2 million 12-lead ECGs and demonstrate that training linear models on PCLR representations achieves a 51% performance increase, on average, over six training set sizes and four tasks (sex classification, age regression, and the detection of left ventricular hypertrophy and atrial fibrillation), relative to training neural network models from scratch. We also compared PCLR to three other ECG pre-training approaches (supervised pre-training, unsupervised pre-training with an autoencoder, and pre-training using a contrastive multi ECG-segment approach), and show significant performance benefits in three out of four tasks. We found an average performance benefit of 47% over the other models and an average of a 9% performance benefit compared to best model for each task. We release PCLR to enable others to extract ECG representations at https://github.com/broadinstitute/ml4h/tree/master/model_zoo/PCLR. Author summary: ECGs are a rich source of cardiac health information. Many recent works have shown that deep learning can extract new information from ECGs when there are a sufficient number of labeled data. However, when there are not enough labeled data or a clinician scientist does not have the resources to train a deep learning model from scratch, options are limited. We introduce Patient Contrastive Learning of Representations (PCLR), an approach to train a neural network that extracts representations of ECGs. The only labels required to train PCLR are which ECG comes from which patient. The resulting ECG representations can be used directly in linear models for new tasks without needing to finetune the neural network. We show PCLR is better than using a set of handpicked features for four tasks, and better than three other deep learning approaches for three out of four tasks evaluated. Furthermore, PCLR is better than training a neural network from scratch when training data are limited. PCLR is one of the first attempts at releasing and evaluating a pre-trained ECG model with the purpose of accelerating deep learning ECG research. [ABSTRACT FROM AUTHOR]