학술논문

PILLAR: How to make semi-private learning more effective
Document Type
Conference
Source
2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) SATML Secure and Trustworthy Machine Learning (SaTML), 2024 IEEE Conference on. :110-139 Apr, 2024
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Training
Privacy
Machine learning algorithms
Machine learning
Complexity theory
Computational efficiency
Probes
Language
Abstract
In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose PILLAR, an easy-to-implement and computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. The key idea is to use public data to estimate the principal components of the pre-trained features and subsequently project the private dataset onto the top-k Principal Components. We empirically validate the effectiveness of our algorithm in a wide variety of experiments under tight privacy constraints (ϵ < 1) and probe its effectiveness in low-data regimes and when the pre-training distribution significantly differs from the one on which SP learning is performed. Despite its simplicity, our algorithm exhibits significantly improved performance, in all of these settings, over all available baselines that use similar amounts of public data while often being more computationally expensive. For example, in the case of CIFAR-100 for ϵ = 0.1, our algorithm improves over the most competitive baselines by a factor of at least two.