학술논문

Bipolar Population Threshold Encoding for Audio Recognition with Deep Spiking Neural Networks
Document Type
Conference
Source
2023 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2023 International Joint Conference on. :1-8 Jun, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Power demand
Neuromorphics
Biological system modeling
Sociology
Neurons
Encoding
Hardware
spiking neural network
neural encoding
audio recognition
spatio-temporal
Language
ISSN
2161-4407
Abstract
Spiking Neural Networks (SNNs) have been in-creasingly investigated for audio recognition due to the low power consumption on neuromorphic hardware by mimicking biological neural systems. Since the SNNs are learned from spikes, a critical step lies in the efficient neural encoding of real-valued sound signals to represent complex temporal patterns in speech and environmental sounds. In this paper, we propose a novel Bipolar Population Threshold (BPT) encoding model that effectively captures the trajectory information of time-series speech data by combining temporal and spatial dimensions. The bipolar encoding technique uses positive and negative neurons to capture the dynamic changes in the audio signal, while the threshold intervals allow for a sparse representation that focuses on encoding significant changes, resulting in an efficient and simplified recognition process. Extensively experimenting on three benchmark datasets including the TIDIGITS with speeches, RWCP with sounds, and MedleyDB with music, the numeric results show the superiority of the proposed method by consistently outperforming the state-of-the-art approaches while with fewer spikes, especially in capturing the complex spatio-temporal patterns of audio signals.