학술논문

Multi-Angle Fusion for Low-Cost Near-Field Ultrasonic in-Air Gesture Recognition
Document Type
Periodical
Source
IEEE Access Access, IEEE. 8:191204-191218 2020
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Acoustics
Array signal processing
Gesture recognition
Microwave integrated circuits
Ultrasonic imaging
Doppler shift
Microphones
signal processing
microphones
deep learning
human-machine interface
Language
ISSN
2169-3536
Abstract
With the growing interest in Human-Machine interface (HMI), an increasing amount of effort is made to provide always-on low-cost touchless control solutions for Internet-of-Things (IoT) edge devices. In this study, we explore near-audio ultrasound for in-air ultrasonic gesture design and recognition. We propose to use beamforming followed by stages of feature extraction and a Temporal Convolution Network (TCN) for classification. The study is applied to a small form factor concentric hexagonal array of 7 microphones where a beamforming stage is leveraged for spatial feature extraction and ultrasonic gesture features fusion from different angles. In case of such limited number of microphones, we show that a customized Filter-and-Sum (FaS) beamformer is well suited for this application with a set of 5-tap filters. We optimize the beamformer by fitting a fixed beam in the ultrasonic frequency domain, to make the ultrasonic band of interest (18 kHz - 24 kHz) available for use. The beamformer generates parallel readings in time of Doppler shifts from a set of assigned angles as a hand gesture is performed near the array. A TCN of only 10k parameters is used to classify these parallel readings into predefined symbols to build a gesture alphabet. The TCN operates on two modes sharing the same TCN structure with the option to switch between them by loading different set of coefficients. Features from concatenated beamformed frequency points are learned with a per symbol classification accuracy in the range 92%-100% computed on a test set and visualized in the form of normalized confusion matrix. The proposed system gives users a degree of flexibility where gesture diversity can be obtained by grouping the trained symbols from the built alphabet in a post-training design stage. This paves the way for flexible, intuitive and easy to remember gestures.