학술논문

Predicting EGFR mutational status from pathology images using a real-world dataset
Document Type
article
Source
Scientific Reports, Vol 13, Iss 1, Pp 1-13 (2023)
Subject
Medicine
Science
Language
English
ISSN
2045-2322
Abstract
Abstract Treatment of non-small cell lung cancer is increasingly biomarker driven with multiple genomic alterations, including those in the epidermal growth factor receptor (EGFR) gene, that benefit from targeted therapies. We developed a set of algorithms to assess EGFR status and morphology using a real-world advanced lung adenocarcinoma cohort of 2099 patients with hematoxylin and eosin (H&E) images exhibiting high morphological diversity and low tumor content relative to public datasets. The best performing EGFR algorithm was attention-based and achieved an area under the curve (AUC) of 0.870, a negative predictive value (NPV) of 0.954 and a positive predictive value (PPV) of 0.410 in a validation cohort reflecting the 15% prevalence of EGFR mutations in lung adenocarcinoma. The attention model outperformed a heuristic-based model focused exclusively on tumor regions, and we show that although the attention model also extracts signal primarily from tumor morphology, it extracts additional signal from non-tumor tissue regions. Further analysis of high-attention regions by pathologists showed associations of predicted EGFR negativity with solid growth patterns and higher peritumoral immune presence. This algorithm highlights the potential of deep learning tools to provide instantaneous rule-out screening for biomarker alterations and may help prioritize the use of scarce tissue for biomarker testing.