학술논문

Transformer VAE: A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning

Document Type

Conference

Author

Jiang, Junyan; Xia, Gus G.; Carlton, Dave B.; Anderson, Chris N.; Miyakawa, Ryan H.

Source

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020 - 2020 IEEE International Conference on. :516-520 May, 2020

Subject

Signal Processing and Analysis
Human computer interaction
Conferences
Signal processing algorithms
Signal processing
Acoustics
Speech processing
Context modeling
Representation learning
VAE
Transformer
music structure

Language

ISSN

2379-190X

Abstract

Structure awareness and interpretability are two of the most desired properties of music generation algorithms. Structure-aware models generate more natural and coherent music with long-term dependencies, while interpretable models are more friendly for human-computer interaction and co-creation. To achieve these two goals simultaneously, we designed the Transformer Variational AutoEncoder, a hierarchical model that unifies the efforts of two recent breakthroughs in deep music generation: 1) the Music Transformer and 2) Deep Music Analogy. The former learns long-term dependencies using attention mechanism, and the latter learns interpretable latent representations using a disentangled conditional-VAE. We showed that Transformer VAE is essentially capable of learning a context-sensitive hierarchical representation, regarding local representations as the context and the dependencies among the local representations as the global structure. By interacting with the model, we can achieve context transfer, realizing the imaginary situation of "what if" a piece is developed following the music flow of another piece.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송