학술논문

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Document Type

Working Paper

Author

Singh, Anima; Vu, Trung; Mehta, Nikhil; Keshavan, Raghunandan; Sathiamoorthy, Maheswaran; Zheng, Yilin; Hong, Lichan; Heldt, Lukasz; Wei, Li; Tandon, Devansh; Chi, Ed H.; Yi, Xinyang

Source

Subject

Computer Science - Information Retrieval
Computer Science - Machine Learning

Language

Abstract

Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송