학술논문

Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements

Document Type

Working Paper

Author

Li, Yang; Li, Gang; He, Luheng; Zheng, Jingjie; Li, Hong; Guan, Zhiwei

Source

Subject

Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Human-Computer Interaction

Language

Abstract

Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a large-scale dataset for widget captioning with crowdsourcing. Our dataset contains 162,859 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces.
Comment: 16 pages, EMNLP 2020

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송