학술논문
“Are You Playing a Shooter Again?!” Deep Representation Learning for Audio-Based Video Game Genre Recognition
Document Type
Periodical
Source
IEEE Transactions on Games IEEE Trans. Games Games, IEEE Transactions on. 12(2):145-154 Jun, 2020
Subject
Language
ISSN
2475-1502
2475-1510
2475-1510
Abstract
In this paper, we present a novel computer audition task: audio-based video game genre classification. The aim of this study is threefold: 1) to check the feasibility of the proposed task; 2) to introduce a new corpus: The Game Genre by Audio + Multimodal Extracts (G$^{2}$AME), collected entirely from social multimedia; and 3) to compare the efficacy of various acoustic feature spaces to classify the G$^{2}$AME corpus into six game genres using a linear support vector machine classifier. For the classification we extract three different feature representations from the game audio files: 1) Knowledge-based acoustic features; 2) Deep Spectrum features; and 3) quantized Deep Spectrum features using Bag-of-Audio-Words. The Deep Spectrum features are a deep-learning-based representation derived from forwarding the visual representations of the audio instances, in particular spectrograms, mel-spectrograms, chromagrams, and their deltas through deep task-independent pretrained CNNs. Specifically, activations of fully connected layers from three common image classification CNNs, GoogLeNet, AlexNet, and VGG16 are used as feature vectors. Results for the six-genre classification problem indicate the suitability of our deep learning approach for this task. Our best method achieves an accuracy of up to 66.9% unweighted average recall using tenfold cross-validation.