학술논문

Holistic Evaluation of Text-To-Image Models

Document Type

Working Paper

Author

Lee, Tony; Yasunaga, Michihiro; Meng, Chenlin; Mai, Yifan; Park, Joon Sung; Gupta, Agrim; Zhang, Yunzhi; Narayanan, Deepak; Teufel, Hannah Benita; Bellagente, Marco; Kang, Minguk; Park, Taesung; Leskovec, Jure; Zhu, Jun-Yan; Fei-Fei, Li; Wu, Jiajun; Ermon, Stefano; Liang, Percy

Source

Subject

Computer Science - Computer Vision and Pattern Recognition
Computer Science - Machine Learning

Language

Abstract

The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/v1.1.0 and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase.
Comment: NeurIPS 2023. First three authors contributed equally

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송