학술논문

Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks
Document Type
Periodical
Source
IEEE Signal Processing Letters IEEE Signal Process. Lett. Signal Processing Letters, IEEE. 26(1):94-98 Jan, 2019
Subject
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
Spectrogram
Convolution
Time-frequency analysis
Training
Convolutional neural networks
Adaptation models
Signal processing algorithms
Phase reconstruction
deep learning
convolutional neural networks
short-time Fourier transform
spectrogram
time-frequency signal processing
speech synthesis
Language
ISSN
1070-9908
1558-2361
Abstract
We propose the multi-head convolutional neural network (MCNN) for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN enables significantly better utilization of modern multi-core processors than commonly used iterative algorithms like Griffin–Lim, and yields very fast (more than 300 × real time) runtime. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.