학술논문

Llemma: An Open Language Model For Mathematics

Document Type

Working Paper

Author

Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer, Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean

Source

Subject

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Logic in Computer Science

Language

Abstract

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Comment: Updated references; corrected description of COPRA search budget

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송