학술논문

DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab
Document Type
Conference
Source
2019 International Conference on Document Analysis and Recognition (ICDAR) ICDAR Document Analysis and Recognition (ICDAR), 2019 International Conference on. :208-213 Sep, 2019
Subject
Computing and Processing
Semantics
Feature extraction
Task analysis
Detectors
Decoding
Encoding
Training
Scene text detection, DeepLab, multiple ASPP, auxiliary IoU losses, auxiliary connections
Language
ISSN
2379-2140
Abstract
In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.