
Protein secondary structure prediction using periodic-quadratic-logistic models: statistical and theoretical issues
Document Type
1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on. 5:375-384 1994
Computing and Processing
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Neural networks
Information theory
Biomedical computing
Maximum-likelihood estimation
We extend logistic discriminant function methodology to compete effectively with neural networks and "information theory" methods in prediction of protein secondary structure. Unlike "black-box" methods, our model produces 400 pairwise interaction parameters which are interpretable from a molecular standpoint. Under optimal conditions, our model can produce up to 65.9% crossvalidated prediction accuracy on three states. A broad family of models is searched using a semi-parametric (penalized) approach combined with stepwise parameter selection. We show that optimal models have about 800 effective parameters for this data set. The highest prediction accuracy is concentrated in a fraction of the total residues, and the confidence of a prediction can be easily calculated. Such high-confidence predictions may be useful as the basis for prediction of the complete structure of the protein.ETX