학술논문

Explicit Content Detection in Music Lyrics Using Machine Learning
Document Type
Conference
Source
2018 IEEE International Conference on Big Data and Smart Computing (BigComp) BIGCOMP Big Data and Smart Computing (BigComp), 2018 IEEE International Conference on. :517-521 Jan, 2018
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Dictionaries
Machine learning
Vocabulary
Filtering
Companies
Broadcasting
Bagging
Machine Learning
NLP
Explicit Contents
Music
Lyrics
Abusive Language
Adolescent Safety
Parent Advisory Lable
Language
ISSN
2375-9356
Abstract
Music has serious effects on children's development. Music lyrics have become more violent and sexual over the years. However, the system for filtering explicit contents in music often does not work properly, not to mention that it takes a lot of time and effort to do it properly. In this study, we propose several machine learning models that automatically detect explicit contents in Korean lyrics and compare their performances. The proposed Bagging with selective vocabulary model outperformed not only the other competing models we designed, but also the filtering method that used the man-made profanity dictionary, which is a widely-used method to detect explicit contents in the industry. The proposed automated lyrics screening approach makes practical contributions to music industry, helping it significantly save time and effort for censoring harmful contents for the youths. The proposed approach is generalizable to other language settings as long as the same kinds of data used in the study are available.