학술논문

A Semi-supervised Deep Learning-Based Solver for Breaking Text-Based CAPTCHAs
Document Type
Conference
Source
2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) TRUSTCOM Trust, Security and Privacy in Computing and Communications (TrustCom), 2021 IEEE 20th International Conference on. :614-619 Oct, 2021
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Deep learning
Image segmentation
Feature extraction
Real-time systems
Data models
Security
Character recognition
Text-based CAPTCHAs
CNN
Seq2Seq
semi-supervised
deep learning
Language
ISSN
2324-9013
Abstract
Text-based CAPTCHAs are still the most widely used CAPTCHA mode. Many researchers have proposed attack methods to break them. In previous attacks, segmentation-based methods require at least three steps: preprocessing, segmentation, and recognition, which means that different modes of CAPTCHA require various preprocessing and segmentation algorithms. In recent years, a series of deep learning (DL) models have been designed for cracking text-based CAPTCHAs. However, these methods require annotating numerous images, which are time-consuming and labor-intensive. In this paper, we propose a semi-supervised DL-based solver for breaking text-based CAPTCHAs, which can use a small number of labeled CAPTCHAs to achieve a high-performance attack model. The CNN module and the attention-based Seq2Seq module are two key components for effective feature extraction and character recognition. The experimental results show that our solver successfully attacked 9 types of most popular text-based CAPTCHAs, and the attack success rate is better than the four latest attack models. In addition, our model does not perform any data preprocessing and has a fast attack speed, making it more suitable for real-time attacks. The code and dataset are available on the github.