학술논문

How to hide the elephant- or the donkey- in the room: Practical privacy against statistical inference for large data
Document Type
Conference
Source
2013 IEEE Global Conference on Signal and Information Processing Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE. :269-272 Dec, 2013
Subject
Signal Processing and Analysis
Privacy
Data privacy
Quantization (signal)
TV
Optimization
Mutual information
Vectors
Language
Abstract
We propose a practical methodology to protect a user's private data, when he wishes to publicly release data that is correlated with his private data, in the hope of getting some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a privacy-preserving probabilistic mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address a practical challenge encountered when applying this theoretical framework to real world data: the optimization may become untractable and face scalability issues when data assumes values in large size alphabets, or is high dimensional. Our work makes two major contributions. We first reduce the optimization size by introducing a quantization step, and show how to generate privacy mappings under quantization. Second, we evaluate our method on a dataset showing correlations between political views and TV viewing habits, and demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g. recommendations.