학술논문

Mutual Information Regularized Feature-Level Frankenstein for Discriminative Recognition
Document Type
Periodical
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 44(9):5243-5260 Sep, 2022
Subject
Computing and Processing
Bioengineering
Task analysis
Semantics
Face recognition
Lighting
Mutual information
Training
Image color analysis
Feature-level disentanglement
discriminative recognition
mutual information
adversarial learning
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
Deep learning recognition approaches can potentially perform better if we can extract a discriminative representation that controllably separates nuisance factors. In this paper, we propose a novel approach to explicitly enforce the extracted discriminative representation $\boldsymbol{d}$d, extracted latent variation $\boldsymbol{l}$l (e,g., background, unlabeled nuisance attributes), and semantic variation label vector $\boldsymbol{s}$s (e.g., labeled expressions/pose) to be independent and complementary to each other. We can cast this problem as an adversarial game in the latent space of an auto-encoder. Specifically, with the to-be-disentangled $\boldsymbol{s}$s, we propose to equip an end-to-end conditional adversarial network with the ability to decompose an input sample into ${\boldsymbol{d}}$d and $\boldsymbol{l}$l. However, we argue that maximizing the cross-entropy loss of semantic variation prediction from $\boldsymbol{d}$d is not sufficient to remove the impact of $\boldsymbol{s}$s from $\boldsymbol{d}$d, and that the uniform-target and entropy regularization are necessary. A collaborative mutual information regularization framework is further proposed to avoid unstable adversarial training. It is able to minimize the differentiable mutual information between the variables to enforce independence. The proposed discriminative representation inherits the desired tolerance property guided by prior knowledge of the task. Our proposed framework achieves top performance on diverse recognition tasks, including digits classification, large-scale face recognition on LFW and IJB-A datasets, and face recognition tolerant to changes in lighting, makeup, disguise, etc.