
Using Short Texts and Emojis to Predict the Gender of a Texter in Turkish
Document Type
2019 4th International Conference on Computer Science and Engineering (UBMK) Computer Science and Engineering (UBMK), 2019 4th International Conference on. :435-438 Sep, 2019
Computing and Processing
Signal Processing and Analysis
Machine Learning
Natural Language Processing
Gender Classification
With the advancement of technology and the spread of social media, people express their ideas, wishes and thoughts in digital environments through texts, which have all but replaced telephone conversations. These texts have also become a very important resource for data science. For example, it is possible to extract a great deal of useful information such as the gender of an author, especially in areas like e-commerce and social media. Online messaging platforms make it easy for users to falsify their name, age, gender and location in order to hide their true identity. It is not uncommon for some to hide their gender while texting. We think that this can lead to some undesirable consequences in terms of both social life and security. Some features, such as vocabulary and syntax, offer clues about a person's gender. The aim of this study is to predict the gender of a Turkish-language texter using text classification techniques in machine learning. Since emojis are an important component of text messaging, they are included in the prediction process. We achieved 67.22%, 74.31% and 68.66% accuracy rates for gender prediction based on text-only, emoji-only and text-and-emoji features respectively.