학술논문

Using the Short-Time Fourier Transform and ResNet to Diagnose Depression from Speech Data

Document Type

Conference

Author

Elfaki, Ayman; Asnawi, Ani Liza; Jusoh, Ahmad Zamani; Ismail, Ahmad Fadzil; Ibrahim, Siti Noorjannah; Mohamed Azmin, Nor Fadhillah; Wahidah Binti Nik Hashim, Nik Nur

Source

2021 IEEE International Conference on Computing (ICOCO) Computing (ICOCO), 2021 IEEE International Conference on. :372-376 Nov, 2021

Subject

Computing and Processing
Training
Fourier transforms
Pandemics
Computational modeling
Neural networks
Computer architecture
Depression
Speech
Deep Learning
Short-Time Fourier Transform

Language

Abstract

Depression is a common illness that is affecting many people nowadays, this is especially true now with the advent of the COVID-19 pandemic. It often arises when a person is having difficulty coping with stressful life events. It can occur throughout the lifespan of a person, and it pervades all aspects of our lives. Currently, depression diagnoses rely on patient interviews and self-report questionnaires, which depend heavily on the patient honesty and the subjective experience of the clinician. In this paper, we will begin with investigating the viability of using the Short-Time Fourier Transform (STFT) as a feature descriptor to objectively diagnose depression from speech data. The dataset used in this research is the Audio-Visual Emotion Challenging 2017 (AVEC2017). The model is based on a modified ResNet18 model architecture to perform a binary classification (i.e., depressed or non-depressed). The STFT is computed from the speech signal to generate a mel-spectrogram for training and testing the model. The experiment shows that relying solely on STFT as an input feature resulted in an F1 score of 74.71% in classifying depression.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송