학술논문

Scene Text Recognition with Multi-Encoders

Document Type

Conference

Author

Yao Wang; Jong-Eun Ha

Source

제어로봇시스템학회 국제학술대회 논문집. 2022-11 2022(11):1615-1620

Subject

Scene text recognition
Transformer
Deep learning
Convolutional neural network

Language

Korean

ISSN

2005-4750

Abstract

Although text recognition has significantly evolved over the years, the current models still have huge challenges, especially for irregular text images, such as complex backgrounds, curved text, diverse fonts, distortions, etc. Currently, CNN-based text recognition networks have shown good performance but still face the above challenges. Recently, feature extractor based on transformer has shown excellent advantages for global feature extraction on images. Especially in irregular text images, which can use self-attention to establish the information connection of each part of the image, which can also reduce the influence of the irregular distribution of characters. Therefore, this paper proposes MESTR(Multi-Encoders Scene Text Recognition) that combines a CNN-based[1][2][6] feature extractor and a transformer-based feature extractor. MESTR can extract local and global features of text images at the same time and then integrate global features into local features. During training, we used CTC[6] as guide training in the decoder part, as the compensation training strategy for attentional decoder. Experimental results demonstrate that the proposed MESTR shows competitive results on all seven benchmarks. At the same time, we provide ablation experiments to show the effectiveness of the improved part on the text recognition model.

Online Access

Full Text (DBPIA)

이메일

부산대학교 도서관

Online Access

메일 발송