학술논문
Optimal Minimization of the Covariance Loss
Document Type
Periodical
Author
Source
IEEE Transactions on Information Theory IEEE Trans. Inform. Theory Information Theory, IEEE Transactions on. 69(2):813-818 Feb, 2023
Subject
Language
ISSN
0018-9448
1557-9654
1557-9654
Abstract
Let $X$ be a random vector valued in $\mathbb {R}^{m}$ such that $\|X\|_{2} \le 1$ almost surely. For every $k\ge 3$ , we show that there exists a sigma algebra $\mathcal {F}$ generated by a partition of $\mathbb {R}^{m}$ into $k$ sets such that $\|\mathrm {Cov}(X) - \mathrm {Cov}(\mathbb {E}[X\mid \mathcal {F}]) \|_{\mathrm {F}} \lesssim \frac {1}{\sqrt {\log {k}}}$ . This is optimal up to the implicit constant and improves on a previous bound due to Boedihardjo, Strohmer, and Vershynin. Our proof provides an efficient algorithm for constructing $\mathcal {F}$ and leads to improved accuracy guarantees for $k$ -anonymous or differentially private synthetic data. We also establish a connection between the above problem of minimizing the covariance loss and the pinning lemma from statistical physics, providing an alternate (and much simpler) algorithmic proof in the important case when $X \in \{\pm 1\}^{m}/\sqrt {m}$ almost surely.