학술논문

Exploring Sparse Visual Odometry Acceleration With High-Level Synthesis
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:70741-70763 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Field programmable gate arrays
Pose estimation
Visual odometry
Data transfer
Cameras
Task analysis
Odometry
FPGA
high-level synthesis
performance optimization
pose estimation
visual odometry
Zynq
SLAM
Language
ISSN
2169-3536
Abstract
Visual Odometry (VO) systems are widely used to determine the position and orientation of a robot or camera in an unknown environment. They are deployed on resource-constrained platforms, such as drones, and virtual reality or augmented reality headsets. VO systems harnessing modern System-on-Chip (SoCs) with integrated Field Programmable Gate Array (FPGA) have the potential to improve overall performance. This paper explores the FPGA acceleration of sparse semi-direct VO kernels using High-level Synthesis (HLS). The selected sparse Semi-direct VO (SVO) system, since its conception, was developed to execute efficiently on low-power processors. We show that both computational and data transfer overheads between the processing cores and the accelerators on the reconfigurable fabric need to be optimized to obtain better end-to-end performance. The additional data movement incurred when using an FPGA accelerator is due to the sparse computational nature together with random memory access patterns of the kernels. This paper shows that state-of-the-art HLS tools are not yet able to perform the required optimizations automatically. These tools usually target successful application kernels with dense computational patterns and regular memory access. In this paper we propose three, potentially general, methods to reduce the data transfer between the processing cores and the customised hardware kernels on the FPGA; these methods are: (a) approximation based on domain-specific knowledge, (b) lossless image compression, and (c) the use of on-the-fly computation. We present a case study of the use of these methods on SVO, a state-of-the-art sparse VO system with a semi-direct front-end. We demonstrate that our proposed methods can reduce data transfer overhead to achieve better end-to-end performance and that they can be applied not only when using standard Xilinx tools, but also with other state-of-the-art HLS tools, such as HeteroFlow. Compared to the baseline performance of the original SVO software on Arm processors, our proposed methods enable the Xilinx SDSoC and HeteroFlow designs to achieve a speedup of $2.4\times $ and $2.14\times $ , respectively, without noticeable accuracy loss. The Xilinx SDSoC and HeteroFlow designs also achieve a $1.85\times $ and $1.89\times $ improvement in energy efficiency, respectively, on a Xilinx Zynq Ultrascale+ SoC with Arm A53 cores and integrated FPGA. Compared to the SVO software baseline running on the Intel Xeon system, our proposed methods enable the Xilinx SDSoC and HeteroFlow designs to achieve $8.2\times $ and $8.3\times $ improvement in energy efficiency, respectively.