학술논문

22.7 DL-VOPU: An Energy-Efficient Domain-Specific Deep-Learning-Based Visual Object Processing Unit Supporting Multi-Scale Semantic Feature Extraction for Mobile Object Detection/Tracking Applications
Document Type
Conference
Source
2023 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2023 IEEE International. :1-3 Feb, 2023
Subject
Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Visualization
Program processors
Semantics
Redundancy
Computer architecture
Streaming media
Feature extraction
Language
ISSN
2376-8606
Abstract
In the recent years, deep learning-based visual object detection/tracking (VODT) has been widely used in intelligent applications such as autonomous driving, UAV, smart robot and $\text{VR}/\text{AR}$. As general Al hardware platforms, GPUs and general Al processors are often used for accelerating VODT. However, without a domain-specific architecture, it is difficult for these processors to achieve high energy efficiency, making them unsuitable for mobile VODT applications. Recently, some dedicated VODT processors have been proposed with improved energy efficiency [1]–[3]. As shown in $\text{Fig}. 22.7.1$, these designs have several issues: 1) they only support a single task (either detection or tracking), 2) they lack full support for multi-scale semantic feature extraction $(\text{MSFE}){-}$ based state-of-the-art VODT frameworks [4], and 3) they do not sufficiently exploit domain-specific features for energy efficiency optimization. To address these issues, in this work, a deep learning-based visual object processor (named DL-VOPU) is proposed for mobile VODT applications. It exploits diverse domain-specific features to achieve record-high energy efficiency for VODT, while supporting MSFE-based VODT frameworks with a programmable backbone network. The DL-VOPU features: 1) an energy-efficient MSFE-aware Al architecture, 2) an object-oriented adaptive computing technique for energy-efficient object tracking, 3) a parallel frame-difference computing technique for energy-efficient neural network (NN) computation on video streams, and 4) a unified data compression & computing technique to address data redundancy in VODT processing.