학술논문

A Transformer-based Siamese Network For Word Image Retrieval In Historical Documents
Document Type
Conference
Source
2023 IEEE Smart World Congress (SWC) Smart World Congress (SWC), 2023 IEEE. :696-703 Aug, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Historical Arabic documents
Word spotting
Vision transformer
Siamese network
Transfer learning
Learning representation
Language
Abstract
The increasing availability of digitized historical documents has sparked a need for effective information processing tools to extract the valuable information contained within them. Word spotting, an area of focus in historical document analysis, involves identifying specific words within images of documents. In this paper, we propose a novel approach for word spotting in historical Arabic documents, utilizing improved feature representations for learning word images. More precisely, we put forward an end-to-end approach for generating word image descriptors, based on the Siamese vision transformer architectures. The model learning is guided by a contrastive loss objective. Additionally, we carry out transfer learning techniques by leveraging knowledge acquired from two distinct source domains to generalize model learning. The proposed approach utilizes the embedding space to evolve the word spotting system by projecting the query word image and all reference word images into the embedding space, where their similarity is determined based on their corresponding embedding vectors. Our method is evaluated on the historical Arabic VML-HD dataset and the results indicate that our approach significantly outperforms state-of-the-art methods.