학술논문

Unifying approach to selective inference with applications to cross-validation
Document Type
Working Paper
Source
Subject
Statistics - Methodology
Language
Abstract
We develop tools to do valid post-selective inference for a family of model selection procedures, including choosing a model via cross-validated Lasso. The tools apply universally when the following random vectors are jointly asymptotically multivariate Gaussian: 1. the vector composed of each model's quality value evaluated under certain model selection criteria (e.g. cross-validation errors across folds, AIC, prediction errors etc.) 2. the test statistics from which we make inference on the parameters; it is worth noting that the parameters here are chosen after model selection methods are performed. Under these assumptions, we derive a pivotal quantity that has an asymptotically Unif(0,1) distribution which can be used to perform tests and construct confidence intervals. Both the tests and confidence intervals are selectively valid for the chosen parameter. While the above assumptions may not be satisfied in some applications, we propose a novel variation to these model selection procedures by adding Gaussian randomizations to either one of the two vectors. As a result, the joint distribution of the above random vectors is multivariate Gaussian and our general tools apply. We illustrate our method by applying it to four important procedures for which very few selective inference results have been developed: cross-validated Lasso, cross-validated randomized Lasso, AIC-based model selection among a fixed set of models and inference for a newly introduced novel marginal LOCO parameter, inspired by the LOCO parameter of Rinaldo et al (2016); and we provide complete results for these cases. For randomized model selection procedures, we develop Markov chain Monte Carlo sampling scheme to construct valid post-selective confidence intervals empirically.