학술논문

Multi-Modal Learning-Based Blind Video Quality Assessment Metric for Synthesized Views
Document Type
Periodical
Source
IEEE Transactions on Broadcasting IEEE Trans. on Broadcast. Broadcasting, IEEE Transactions on. 70(1):208-222 Mar, 2024
Subject
Communication, Networking and Broadcast Technologies
Measurement
Distortion
Feature extraction
Quality assessment
Video recording
Visualization
Convolutional neural networks
Synthesized video quality assessment
no-reference
multi-model learning
sparse dictionary
Language
ISSN
0018-9316
1557-9611
Abstract
The quality attenuation of synthesized video will directly affect the widespread adoption of immersive video, so it is crucial to design a quality assessment model that can determine whether the synthesized video meets the requirements of commercial broadcasting. However, designing a general-purpose no-reference quality assessment metric for synthesized videos is difficult due to the imperfect view synthesizing technology and scene diversity. Currently, the existed quality assessment algorithms for synthesized views are mostly based on handcrafted feature extraction. Inspired by the theory that the input stimuli are hierarchically and sparsely processed in the cerebral cortex, we combine Convolutional Neural Network (CNN) learning and sparse dictionary learning mechanisms, and propose a Multi-Model Learning based Blind Synthesized Video Quality Assessment (MML-BSVQA) metric. Firstly, to better reflect the spatio-temporal distortions, we convert the synthesized video into the Spatial Domain (SD), Vertical Temporal Domain (VTD) and Horizontal Temporal Domain (HTD) using video decomposition operation plus optical flow estimation. Secondly, we extract the deep semantic features from three domains based on a pre-trained CNN model. Thirdly, we represent the sparse features of three domains using respective trained over-complete sparse dictionaries. Note that both the CNN model and sparse dictionaries are trained on natural videos to ensure the general-purpose of the proposed MML-BSVQA metric. Finally, the score of a synthesized video is generated by weighted regression. Experimental results on three synthesized video databases demonstrate that the proposed metric outperforms classic and state-of-the-art quality assessment metrics.