학술논문

Adaptive Spatial-Temporal Fusion of Multi-Objective Networks for Compressed Video Perceptual Enhancement
Document Type
Conference
Source
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) CVPRW Computer Vision and Pattern Recognition Workshops (CVPRW), 2021 IEEE/CVF Conference on. :268-275 Jun, 2021
Subject
Computing and Processing
Training
Computer vision
Adaptation models
Adaptive systems
Fuses
Conferences
Pattern recognition
Language
ISSN
2160-7516
Abstract
Perceptual quality enhancement of heavily compressed videos is a difficult, unsolved problem because there still not exists a suitable perceptual similarity loss function between two video pairs. Motivated by the fact that it is hard to design unified training objectives which are perceptual-friendly for enhancing regions with smooth content and regions with rich textures simultaneously, in this paper, we propose a simple yet effective novel solution dubbed "Adaptive Spatial-Temporal Fusion of Two-Stage Multi-Objective Networks" (ASTF) to adaptive fuse the enhancement results from networks trained with two different optimization objectives. Specifically, the proposed ASTF takes an enhancement frame along with its neighboring frames as input to jointly predict a mask to indicate regions with high-frequency textual details. Then we use the mask to fuse two enhancement results which can retain both smooth content and rich textures. Extensive experiments show that our method achieves a promising performance of compressed video perceptual quality enhancement.