학술논문

Supervised Learning for Nonsequential Data: A Canonical Polyadic Decomposition Approach
Document Type
Periodical
Source
IEEE Transactions on Neural Networks and Learning Systems IEEE Trans. Neural Netw. Learning Syst. Neural Networks and Learning Systems, IEEE Transactions on. 33(10):5162-5176 Oct, 2022
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
General Topics for Engineers
Tensors
Data models
Predictive models
Feature extraction
Task analysis
Artificial neural networks
Supervised learning
Canonical polyadic decomposition (CPD)
classification
nonsequential data
recommender systems
regression
sparse data
supervised learning
tensor networks (TNs)
Language
ISSN
2162-237X
2162-2388
Abstract
Efficient modeling of feature interactions underpins supervised learning for nonsequential tasks, characterized by a lack of inherent ordering of features (variables). The brute force approach of learning a parameter for each interaction of every order comes at an exponential computational and memory cost (curse of dimensionality). To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor, the order of which is equal to the number of features; for efficiency, it can be further factorized into a compact tensor train (TT) format. However, both TT and other tensor networks (TNs), such as tensor ring and hierarchical Tucker, are sensitive to the ordering of their indices (and hence to the features). To establish the desired invariance to feature ordering, we propose to represent the weight tensor through the canonical polyadic (CP) decomposition (CPD) and introduce the associated inference and learning algorithms, including suitable regularization and initialization schemes. It is demonstrated that the proposed CP-based predictor significantly outperforms other TN-based predictors on sparse data while exhibiting comparable performance on dense nonsequential tasks. Furthermore, for enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors. In conjunction with feature vector normalization, this is shown to yield dramatic improvements in performance for dense nonsequential tasks, matching models such as fully connected neural networks.