학술논문

Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors
Document Type
Conference
Source
2021 IEEE International Conference on Robotics and Automation (ICRA) Robotics and Automation (ICRA), 2021 IEEE International Conference on. :14098-14104 May, 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Dynamics
Toy manufacturing industry
Vision sensors
Filtering algorithms
Feature extraction
Robot sensing systems
Spatiotemporal phenomena
Convolutional neural networks
Voltage control
Standards
Language
ISSN
2577-087X
Abstract
This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high-speed motion. As a representative scenario we consider a robot at rest, reacting to a small, fast approaching object at speeds higher than 15 m/s. Since conventional image sensors at typical frame rates observe such an object for only a few frames, estimating the underlying motion presents a considerable challenge for standard computer vision systems and algorithms. We present a method motivated by how animals such as insects solve this problem with their relatively simple vision systems.Our solution takes the event stream from a DVS and first encodes the temporal events with a set of causal exponential filters across multiple time scales. We couple these filters with a Convolutional Neural Network (CNN) to efficiently extract relevant spatiotemporal features. The combined network learns to output both the expected time to collision of the object, as well as the predicted collision point on a discretized polar grid. These critical estimates are computed with minimal delay by the network in order to react appropriately to the incoming object. We highlight our system’s results with a toy dart moving at 23.4 m/s with a 24.73° error in θ, 18.4 mm average discretized radius prediction error, and 25.03% median time to collision prediction error.