학술논문

Multiview Deep Subspace Clustering Networks
Document Type
Periodical
Source
IEEE Transactions on Cybernetics IEEE Trans. Cybern. Cybernetics, IEEE Transactions on. 54(7):4280-4293 Jul, 2024
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Robotics and Control Systems
General Topics for Engineers
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Feature extraction
Image reconstruction
Representation learning
Interviews
Task analysis
Electronic mail
Deep learning
Deep clustering
multiview learning
self-representation
subspace clustering
Language
ISSN
2168-2267
2168-2275
Abstract
Multiview subspace clustering aims to discover the inherent structure of data by fusing multiple views of complementary information. Most existing methods first extract multiple types of handcrafted features and then learn a joint affinity matrix for clustering. The disadvantage of this approach lies in two aspects: 1) multiview relations are not embedded into feature learning and 2) the end-to-end learning manner of deep learning is not suitable for multiview clustering. Even when deep features have been extracted, it is a nontrivial problem to choose a proper backbone for clustering on different datasets. To address these issues, we propose the multiview deep subspace clustering networks (MvDSCNs), which learns a multiview self-representation matrix in an end-to-end manner. The MvDSCN consists of two subnetworks, i.e., a diversity network (Dnet) and a universality network (Unet). A latent space is built using deep convolutional autoencoders, and a self-representation matrix is learned in the latent space using a fully connected layer. Dnet learns view-specific self-representation matrices, whereas Unet learns a common self-representation matrix for all views. To exploit the complementarity of multiview representations, the Hilbert–Schmidt independence criterion (HSIC) is introduced as a diversity regularizer that captures the nonlinear, high-order interview relations. Because different views share the same label space, the self-representation matrices of each view are aligned to the common one by universality regularization. The MvDSCN also unifies multiple backbones to boost clustering performance and avoid the need for model selection. Experiments demonstrate the superiority of the MvDSCN.