학술논문

Evaluating Regression Models with Partial Data: A Sampling Approach
Document Type
Conference
Source
2023 9th International Conference on Control, Decision and Information Technologies (CoDIT) Control, Decision and Information Technologies (CoDIT), 2023 9th International Conference on. :1882-1887 Jul, 2023
Subject
Aerospace
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Machine learning algorithms
Pipelines
Estimation
Machine learning
Nonuniform sampling
Data models
Manufacturing
Language
ISSN
2576-3555
Abstract
Machine learning methods rely on data to uncover relationships between inputs and outputs of complex systems, making it crucial to have sufficient amounts of representative data. Therefore, recent research has focused on choosing informative input-output pairs, i.e., labeled data, to facilitate the adoption of machine learning in science and engineering applications. Despite these efforts, estimating the test error with a limited amount of labeled data still needs to be explored. Hence, this paper investigates a novel framework for selecting informative labeled samples from a set of unlabeled testing instances to evaluate regression models with the quadratic loss function. Key contributions of this work include the design of nonuniform sampling distributions over candidate testing points and the deployment of an unbiased estimator to achieve desirable tradeoffs between estimation accuracy and testing data size. Comprehensive experimental results corroborate the impressive performance and flexibility of the proposed approach in real-world applications, such as reducing the standard deviation of the resulting estimator by almost a factor of two compared to uniform sampling. The paper concludes with practical advice for researchers and practitioners who encounter difficulties related to limited labeled data.