학술논문

Scene Text Recognition with Multi-Encoders
Document Type
Conference
Source
제어로봇시스템학회 국제학술대회 논문집. 2022-11 2022(11):1615-1620
Subject
Scene text recognition
Transformer
Deep learning
Convolutional neural network
Language
Korean
ISSN
2005-4750
Abstract
Although text recognition has significantly evolved over the years, the current models still have huge challenges, especially for irregular text images, such as complex backgrounds, curved text, diverse fonts, distortions, etc. Currently, CNN-based text recognition networks have shown good performance but still face the above challenges. Recently, feature extractor based on transformer has shown excellent advantages for global feature extraction on images. Especially in irregular text images, which can use self-attention to establish the information connection of each part of the image, which can also reduce the influence of the irregular distribution of characters. Therefore, this paper proposes MESTR(Multi-Encoders Scene Text Recognition) that combines a CNN-based[1][2][6] feature extractor and a transformer-based feature extractor. MESTR can extract local and global features of text images at the same time and then integrate global features into local features. During training, we used CTC[6] as guide training in the decoder part, as the compensation training strategy for attentional decoder. Experimental results demonstrate that the proposed MESTR shows competitive results on all seven benchmarks. At the same time, we provide ablation experiments to show the effectiveness of the improved part on the text recognition model.

Online Access