학술논문

Saliency-Guided Transformer Network combined with Local Embedding for No-Reference Image Quality Assessment
Document Type
Conference
Source
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) ICCVW Computer Vision Workshops (ICCVW), 2021 IEEE/CVF International Conference on. :1953-1962 Oct, 2021
Subject
Computing and Processing
Image quality
Adaptation models
Visualization
Image resolution
Machine vision
Predictive models
Transformers
Language
ISSN
2473-9944
Abstract
No-Reference Image Quality Assessment (NR-IQA) methods based on Vision Transformer have recently drawn much attention for their superior performance. Unfortunately, being a crude combination of NR-IQA and Transformer, they can hardly take the advantage of their strengths. In this paper, we propose a novel Saliency-Guided Transformer Network combined with Local Embedding (TranSLA) for No-Reference Image Quality Assessment. Our TranSLA integrates different-level information for a robust representation. Existed researches have shown that the human vision system concentrates more on the Region-of-interest (RoI) when assessing the image quality. Thus we combine saliency prediction with Transformer to guide the model highlight the RoI when aggregating the global information. Besides, we import local embedding for Transformer with gradient map. Since the gradient map focuses on extracting structured feature in detail, it can be used as a supplement to offer local information for Transformer. Then, the local and non-local information can be utilized. Moreover, to accelerate the aggregation of information from all tokens, we introduce a Boosting Interaction Module (BIM) to enhance feature aggregation. BIM forces patch tokens to interact better with class tokens at all levels. Experiments on two large-scale NR-IQA benchmarks demonstrate that our method significantly outperforms the state-of-the-art.