학술논문

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Document Type

Working Paper

Author

Botev, Aleksandar; De, Soham; Smith, Samuel L; Fernando, Anushan; Muraru, George-Cristian; Haroun, Ruba; Berrada, Leonard; Pascanu, Razvan; Sessa, Pier Giuseppe; Dadashi, Robert; Hussenot, Léonard; Ferret, Johan; Girgin, Sertan; Bachem, Olivier; Andreev, Alek; Kenealy, Kathleen; Mesnard, Thomas; Hardin, Cassidy; Bhupatiraju, Surya; Pathak, Shreya; Sifre, Laurent; Rivière, Morgane; Kale, Mihir Sanjay; Love, Juliette; Tafti, Pouya; Joulin, Armand; Fiedel, Noah; Senter, Evan; Chen, Yutian; Srinivasan, Srivatsan; Desjardins, Guillaume; Budden, David; Doucet, Arnaud; Vikram, Sharad; Paszke, Adam; Gale, Trevor; Borgeaud, Sebastian; Chen, Charlie; Brock, Andy; Paterson, Antonia; Brennan, Jenny; Risdal, Meg; Gundluru, Raj; Devanathan, Nesh; Mooney, Paul; Chauhan, Nilay; Culliton, Phil; Martins, Luiz Gustavo; Bandy, Elisa; Huntsperger, David; Cameron, Glenn; Zucker, Arthur; Warkentin, Tris; Peran, Ludovic; Giang, Minh; Ghahramani, Zoubin; Farabet, Clément; Kavukcuoglu, Koray; Hassabis, Demis; Hadsell, Raia; Teh, Yee Whye; de Frietas, Nando

Source

Subject

Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Computer Science - Computation and Language

Language

Abstract

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송