학술논문

Deep Video Decaptioning via Subtitle Mask Prediction and Inpainting
Document Type
Conference
Source
2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), 2022 IEEE 5th. 5:1836-1839 Dec, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Robotics and Control Systems
Visualization
Automation
Streaming media
Maintenance engineering
Real-time systems
Information management
video decaptioning
mask prediction
deep learning
encoder-decoder
Language
ISSN
2693-2776
Abstract
Video decaptioning is the process of automatically removing subtitles from video frames and inpainting the captioned regions. However, directly transferring the deep-learning-based image inpainting methods to video decaptioning scenarios always requires the masks of subtitled regions, which is unavailable for subtitled video frames. To address these issues, we propose a two-stage lightweight framework. The first caption mask prediction stage uses an encoder-decoder full convolutional network with residual blocks to predict the caption mask. The second background inpainting stage uses an encoder-decoder structure with attention modules and the skip connection to repair the background areas. Extensive experiments demonstrate that our proposed model can produce better visual results and outperform state-of-the-art methods.