학술논문

SHINE: protein language model-based pathogenicity prediction for short inframe insertion and deletion variants.

Document Type

Article

Author

Fan, Xiao; Pan, Hongbing; Tian, Alan; Chung, Wendy K; Shen, Yufeng

Source

Briefings in Bioinformatics. Jan2023, Vol. 24 Issue 1, p1-7. 7p.

Subject

*AMINO acid sequence
*SUPERVISED learning
*LANGUAGE models
*PROTEIN structure
*PROTEIN models

Language

ISSN

1467-5463

Abstract

Accurate variant pathogenicity predictions are important in genetic studies of human diseases. Inframe insertion and deletion variants (indels) alter protein sequence and length, but not as deleterious as frameshift indels. Inframe indel Interpretation is challenging due to limitations in the available number of known pathogenic variants for training. Existing prediction methods largely use manually encoded features including conservation, protein structure and function, and allele frequency to infer variant pathogenicity. Recent advances in deep learning modeling of protein sequences and structures provide an opportunity to improve the representation of salient features based on large numbers of protein sequences. We developed a new pathogenicity predictor for SH ort I nframe i N sertion and d E letion (SHINE). SHINE uses pretrained protein language models to construct a latent representation of an indel and its protein context from protein sequences and multiple protein sequence alignments, and feeds the latent representation into supervised machine learning models for pathogenicity prediction. We curated training data from ClinVar and gnomAD, and created two test datasets from different sources. SHINE achieved better prediction performance than existing methods for both deletion and insertion variants in these two test datasets. Our work suggests that unsupervised protein language models can provide valuable information about proteins, and new methods based on these models can improve variant interpretation in genetic analyses. [ABSTRACT FROM AUTHOR]

Online Access

EBSCOHost PDF Open Access (OUP) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송