KOR

e-Article

PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences
Document Type
Conference
Source
2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL) JCDL Digital Libraries (JCDL), 2023 ACM/IEEE Joint Conference on. :251-252 Jun, 2023
Subject
Computing and Processing
Training
Data privacy
Analytical models
Law
Social sciences
Machine learning
Data models
distributed analytics
data privacy
social sciences
data science
Language
ISSN
2575-8152
Abstract
Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME-SoSci, a distributed analytics tool that federates model implementation and training. PADME-SoSci uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.