학술논문

HDSuper: High-Quality and High Computational Utilization Edge Super-Resolution Accelerator With Hardware-Algorithm Co-Design Techniques

Document Type

Periodical

Author

Zhao, X.; Chang, L.; Fan, D.; Hu, Z.; Yue, T.; Tu, F.; Zhou, J.

Source

IEEE Transactions on Circuits and Systems I: Regular Papers IEEE Trans. Circuits Syst. I Circuits and Systems I: Regular Papers, IEEE Transactions on. 71(4):1679-1692 Apr, 2024

Subject

Components, Circuits, Devices and Systems
Superresolution
Hardware
Feature extraction
Convolution
Image reconstruction
Inference algorithms
Computational efficiency
Super-resolution
co-design
efficient mapping
high-quality image
ASIC
FPGA

Language

ISSN

1549-8328
1558-0806

Abstract

Super-resolution (SR) techniques have been employed to construct high-definition images from low-quality images. Various neural networks have demonstrated excellent image-reconstruction quality in SR accelerators. However, deploying SR networks on edge devices is limited by resources and power consumption induced by significant algorithm parameters, computation complexity, and external memory accesses. This work explores the hardware algorithm co-design techniques to provide an end-to-end platform with a lightweight super-resolution network (LSR) and an efficient, high-quality SR accelerator HDSuper. For algorithm design, the improved depth-wise separable convolution and pixelshuffle layers are developed to reduce network size and computation complexity by considering the hardware constraints. Also, the improved channel attention (CA) blocks enhance the image reconstruction quality. For hardware accelerator design, we design a unified computing core (UCC) combined with an efficient flattening-and-allocation (F-A) mapping strategy to support various operators with high computational utilization. In addition, we design the patch computing scheme to reduce the external memory access of the hardware architecture. Based on the evaluation, the proposed algorithm achieves high-quality image reconstruction with $37.44dB$ PSNR. Finally, the FPGA demonstration and ASIC layout under UMC 55nm are achieved with low power consumption ( $2.08 W$ and $152 mW$ ) under the lowest hardware resources compared to the state-of-the-art works.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송