학술논문

Stratified cross-validation for unbiased and privacy-preserving federated learning

Document Type

Working Paper

Author

Bey, R.; Goussault, R.; Benchoufi, M.; Porcher, R.

Source

Subject

Statistics - Machine Learning
Computer Science - Machine Learning
Statistics - Methodology

Language

Abstract

Large-scale collections of electronic records constitute both an opportunity for the development of more accurate prediction models and a threat for privacy. To limit privacy exposure new privacy-enhancing techniques are emerging such as federated learning which enables large-scale data analysis while avoiding the centralization of records in a unique database that would represent a critical point of failure. Although promising regarding privacy protection, federated learning prevents using some data-cleaning algorithms thus inducing new biases. In this work we focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings without relying on demanding deduplication algorithms.
Comment: 13 pages, 5 figures

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송