학술논문

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Document Type
Working Paper
Source
Subject
Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Language
Abstract
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.