학술논문

Automated Feature Interaction and Feature Representation Learning of Multi-field Categorical Data
Document Type
Conference
Source
2023 9th International Conference on Big Data and Information Analytics (BigDIA) Big Data and Information Analytics (BigDIA), 2023 9th International Conference on. :37-42 Dec, 2023
Subject
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Representation learning
Frequency modulation
Neural networks
Feature extraction
Production facilities
Pattern recognition
Recommender systems
Multi-field Categorical Data
Feature Representation Learning
CTR
Language
ISSN
2771-6902
Abstract
Categorical data across diverse domains has been extensively employed, encompassing areas such as online advertising, recommendation systems, and internet search. Conventional approaches, which represent it as a binary feature in a high-dimensional space using one-hot encoding, encounter significant data sparsity challenges. Therefore, feature embedding technique is required. FM (Factory Machine), recognized for its efficacy in feature embedding, struggles to effectively uncover intricate high-order patterns. This study introduces a novel Cat2vec-based Factory Machine (CFM) designed to acquire distributed representations of multi-field categorical data. The model employs an Embedding Layer for extracting hidden vectors from initial features, an Interaction Layer + K-Max Pooling Layer for automatic feature interaction and the capture of significant high-order interactions, and finally, an FM Layer to determine the model’s loss function. Empirical findings on an extensive public CTR prediction dataset showcase the superior performance of CFM over several robust benchmarks.