학술논문

Optimal Minimization of the Covariance Loss
Document Type
Periodical
Source
IEEE Transactions on Information Theory IEEE Trans. Inform. Theory Information Theory, IEEE Transactions on. 69(2):813-818 Feb, 2023
Subject
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Complexity theory
Covariance matrices
Physics
Partitioning algorithms
Loss measurement
Upper bound
Random variables
Covariance loss
pinning
randomized rounding
synthetic data
differential privacy
Language
ISSN
0018-9448
1557-9654
Abstract
Let $X$ be a random vector valued in $\mathbb {R}^{m}$ such that $\|X\|_{2} \le 1$ almost surely. For every $k\ge 3$ , we show that there exists a sigma algebra $\mathcal {F}$ generated by a partition of $\mathbb {R}^{m}$ into $k$ sets such that $\|\mathrm {Cov}(X) - \mathrm {Cov}(\mathbb {E}[X\mid \mathcal {F}]) \|_{\mathrm {F}} \lesssim \frac {1}{\sqrt {\log {k}}}$ . This is optimal up to the implicit constant and improves on a previous bound due to Boedihardjo, Strohmer, and Vershynin. Our proof provides an efficient algorithm for constructing $\mathcal {F}$ and leads to improved accuracy guarantees for $k$ -anonymous or differentially private synthetic data. We also establish a connection between the above problem of minimizing the covariance loss and the pinning lemma from statistical physics, providing an alternate (and much simpler) algorithmic proof in the important case when $X \in \{\pm 1\}^{m}/\sqrt {m}$ almost surely.