학술논문

“Are You Playing a Shooter Again?!” Deep Representation Learning for Audio-Based Video Game Genre Recognition
Document Type
Periodical
Source
IEEE Transactions on Games IEEE Trans. Games Games, IEEE Transactions on. 12(2):145-154 Jun, 2020
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Games
Feature extraction
Task analysis
Acoustics
Monitoring
YouTube
Sports
Audio classification
convolutional neural network (CNN)
deep learning
game genre classification
Language
ISSN
2475-1502
2475-1510
Abstract
In this paper, we present a novel computer audition task: audio-based video game genre classification. The aim of this study is threefold: 1) to check the feasibility of the proposed task; 2) to introduce a new corpus: The Game Genre by Audio + Multimodal Extracts (G$^{2}$AME), collected entirely from social multimedia; and 3) to compare the efficacy of various acoustic feature spaces to classify the G$^{2}$AME corpus into six game genres using a linear support vector machine classifier. For the classification we extract three different feature representations from the game audio files: 1) Knowledge-based acoustic features; 2) Deep Spectrum features; and 3) quantized Deep Spectrum features using Bag-of-Audio-Words. The Deep Spectrum features are a deep-learning-based representation derived from forwarding the visual representations of the audio instances, in particular spectrograms, mel-spectrograms, chromagrams, and their deltas through deep task-independent pretrained CNNs. Specifically, activations of fully connected layers from three common image classification CNNs, GoogLeNet, AlexNet, and VGG16 are used as feature vectors. Results for the six-genre classification problem indicate the suitability of our deep learning approach for this task. Our best method achieves an accuracy of up to 66.9% unweighted average recall using tenfold cross-validation.