학술논문

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers

Document Type

Working Paper

Author

Wu, Tsung-Han; Hsieh, Chun-Chen; Chen, Yen-Hao; Chi, Po-Han; Lee, Hung-yi

Source

Subject

Electrical Engineering and Systems Science - Audio and Speech Processing
Computer Science - Computation and Language
Computer Science - Sound

Language

Abstract

In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention algorithms; then, we pre-train the transformer-based model with those attention algorithms in a self-supervised fashion and treat them as feature extractors on downstream tasks, including phoneme classification and speaker classification. With the assistance of t-SNE, PCA and some observation, the attention weights in self-supervised audio transformers can be categorized into four general cases. Based on these cases and some analyses, we are able to use a specific set of attention weights to initialize the model. Our approach shows comparable performance to the typical self-attention yet requires 20% less time in both training and inference.

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송