학술논문

Subsampling based variable selection for generalized linear models.
Document Type
Academic Journal
Author
Capanu M; Memorial Sloan Kettering Cancer Center, NY, USA.; Giurcanu M; Department of Public Health Sciences, University of Chicago, IL, USA.; Begg CB; Memorial Sloan Kettering Cancer Center, NY, USA.; Gönen M; Memorial Sloan Kettering Cancer Center, NY, USA.
Source
Publisher: Elsevier B.V Country of Publication: Netherlands NLM ID: 100960938 Publication Model: Print-Electronic Cited Medium: Print ISSN: 0167-9473 (Print) Linking ISSN: 01679473 NLM ISO Abbreviation: Comput Stat Data Anal Subsets: PubMed not MEDLINE
Subject
Language
English
ISSN
0167-9473
Abstract
A novel variable selection method for low-dimensional generalized linear models is introduced. The new approach called AIC OPTimization via STABility Selection (OPT-STABS) repeatedly subsamples the data, minimizes Akaike's Information Criterion (AIC) over a sequence of nested models for each subsample, and includes in the final model those predictors selected in the minimum AIC model in a large fraction of the subsamples. New methods are also introduced to establish an optimal variable selection cutoff over repeated subsamples. An extensive simulation study examining a variety of proposec variable selection methods shows that, although no single method uniformly outperforms the others in all the scenarios considered, OPT-STABS is consistently among the best-performing methods in most settings while it performs competitively for the rest. This is in contrast to other candidate methods which either have poor performance across the board or exhibit good performance in some settings, but very poor in others. In addition, the asymptotic properties of the OPT-STABS estimator are derived, and its root-n consistency and asymptotic normality are proved. The methods are applied to two datasets involving logistic and Poisson regressions.