학술논문

Towards a high-performance AI compiler with upstream MLIR

Document Type

Working Paper

Author

Golin, Renato; Chelini, Lorenzo; Siemieniuk, Adam; Madhu, Kavitha; Hasabnis, Niranjan; Pabst, Hans; Georganas, Evangelos; Heinecke, Alexander

Source

Subject

Computer Science - Programming Languages
Computer Science - Artificial Intelligence
Computer Science - Hardware Architecture
Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Machine Learning

Language

Abstract

This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. We demonstrate this flow with a proof-of-concept MLIR project that uses input IR in Linalg-on-Tensor from TensorFlow and PyTorch, performs cache-level optimizations and lowering to micro-kernels for efficient vectorization, achieving over 90% of the performance of ninja-written equivalent programs. The contributions of this work include: (1) Packing primitives on the tensor dialect and passes for cache-aware distribution of tensors (single and multi-core) and type-aware instructions (VNNI, BFDOT, BFMMLA), including propagation of shapes across the entire function; (2) A linear algebra pipeline, including tile, fuse and bufferization strategies to get model-level IR into hardware friendly tile calls; (3) A mechanism for micro-kernel lowering to an open source library that supports various CPUs.
Comment: 13 pages, 8 figures, presented at CGO C4ML 2024 & MLIR Workshop EuroLLVM 2024

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송