학술논문

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Document Type

Working Paper

Author

Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard; Driessche, George van den; Hendricks, Lisa Anne; Rauh, Maribeth; Huang, Po-Sen; Glaese, Amelia; Welbl, Johannes; Dathathri, Sumanth; Huang, Saffron; Uesato, Jonathan; Mellor, John; Higgins, Irina; Creswell, Antonia; McAleese, Nat; Wu, Amy; Elsen, Erich; Jayakumar, Siddhant; Buchatskaya, Elena; Budden, David; Sutherland, Esme; Simonyan, Karen; Paganini, Michela; Sifre, Laurent; Martens, Lena; Li, Xiang Lorraine; Kuncoro, Adhiguna; Nematzadeh, Aida; Gribovskaya, Elena; Donato, Domenic; Lazaridou, Angeliki; Mensch, Arthur; Lespiau, Jean-Baptiste; Tsimpoukelli, Maria; Grigorev, Nikolai; Fritz, Doug; Sottiaux, Thibault; Pajarskas, Mantas; Pohlen, Toby; Gong, Zhitao; Toyama, Daniel; d'Autume, Cyprien de Masson; Li, Yujia; Terzi, Tayfun; Mikulik, Vladimir; Babuschkin, Igor; Clark, Aidan; Casas, Diego de Las; Guy, Aurelia; Jones, Chris; Bradbury, James; Johnson, Matthew; Hechtman, Blake; Weidinger, Laura; Gabriel, Iason; Isaac, William; Lockhart, Ed; Osindero, Simon; Rimell, Laura; Dyer, Chris; Vinyals, Oriol; Ayoub, Kareem; Stanway, Jeff; Bennett, Lorrayne; Hassabis, Demis; Kavukcuoglu, Koray; Irving, Geoffrey

Source

Subject

Computer Science - Computation and Language
Computer Science - Artificial Intelligence

Language

Abstract

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.
Comment: 120 pages

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송