학술논문

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

Document Type

Working Paper

Author

El-Hajj, Hassan; Valleriani, Matteo

Source

Subject

Computer Science - Computer Vision and Pattern Recognition
Computer Science - Artificial Intelligence
Computer Science - Digital Libraries

Language

Abstract

In this paper, we present a pipeline for image extraction from historical documents using foundation models, and evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity. The motivation for this approach stems from the high interest of historians in visual elements printed alongside historical texts on the one hand, and from the relative lack of well-annotated datasets within the humanities when compared to other domains. We propose a sequential approach that relies on GroundDINO and Meta's Segment-Anything-Model (SAM) to retrieve a significant portion of visual data from historical documents that can then be used for downstream development tasks and dataset creation, as well as evaluate the effect of different linguistic prompts on the resulting detections.
Comment: 12 pages, 3 figures, Accepted in ICIAP2023, AI4DH workshop

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송