학술논문

Statistical Compression of Protein Folding Patterns for Inference of Recurrent Substructural Themes

Document Type

Conference

Author

Subramanian, R.; Allison, L.; Stuckey, P.J.; Garcia de la Banda, M.; Abramson, D.; Lesk, A.M.; Konagurthu, A.S.

Source

2017 Data Compression Conference (DCC) DCC Data Compression Conference (DCC), 2017. :340-349 Apr, 2017

Subject

Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Proteins
Dictionaries
Three-dimensional displays
Channel coding
Geometry
Estimation
Protein structure
super-secondary structural patterns
Minimum Message Length
MML

Language

ISSN

2375-0359

Abstract

Computational analyses of the growing corpus of three-dimensional (3D) structures of proteins have revealed a limited set of recurrent substructural themes, termed super-secondary structures. Knowledge of super-secondary structures is important for the study of protein evolution and for the modeling of proteins with unknown structures. Characterizing a comprehensive dictionary of these super-secondary structures has been an unanswered computational challenge in protein structural studies. This paper presents an unsupervised method for learning such a comprehensive dictionary using the statistical framework of lossless compression on a database comprised of concise geometric representations of protein 3D folding patterns. The best dictionary is defined as the one that yields the most compression of the database. Here we describe the inference methodology and the statistical models used to estimate the encoding lengths. An interactive website for this dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송