학술논문

Modeling Obstructive Sleep Apnea Voices Using Deep Neural Network Embeddings and Domain-Adversarial Training
Document Type
Periodical
Source
IEEE Journal of Selected Topics in Signal Processing IEEE J. Sel. Top. Signal Process. Selected Topics in Signal Processing, IEEE Journal of. 14(2):240-250 Feb, 2020
Subject
Signal Processing and Analysis
Sleep
Hidden Markov models
Neural networks
Indexes
Speaker recognition
Training
Obstructive sleep apnea (OSA)
speech
deep neural network embeddings
domain-adversarial training
Language
ISSN
1932-4553
1941-0484
Abstract
Obstructive Sleep Apnea (OSA) is a sleep breathing disorder affecting at least 3–7% of male adults and 2–5% of female adults between 30 and 70 years. It causes recurrent partial or total obstruction episodes at the level of the pharynx which causes cessation of breath during sleep. The number of obstruction episodes per sleep hour, known as Apnea-Hypopnea Index (AHI), along with the degree of the daytime sleepiness, determine the severity of OSA. Usually, OSA is diagnosed at a Sleep Unit in a hospital by the time-consuming polysomnography (PSG) test. Based on the expected impact of anatomical and physiological effects of the altered structure of the upper airway in OSA patients’ voices, the assessment of OSA from speech has been proposed as a simple way to help in the diagnostic process. In this paper, we review previous research to assess OSA from speech and underline the difficulty of a weak connection between OSA and speech. We present results to model OSA using, to the best of our knowledge, for the first time Deep Learning on the largest existing database of OSA voice recordings and speakers’ clinical variables. Using state-of-the-art speaker recognition techniques: acoustic subspace modeling (i-vectors), and deep neural network embeddings (x-vectors), we confirm the weak connection between speech and OSA. We hypothesize that this weak effect is mediated by undesired sources of variability as speakers’ age, body mass index (BMI), or height, and we propose Domain-Adversarial Training (DAT) to remove them. Our results show that, taking BMI as adversarial domain, when classifying voices from OSA extreme cases (AHI $\leq$ 10 vs. AHI $\geq$ 30) accuracy increases from 69.39% to 76.60%. We hope these results can encourage the use of adversarial-domain neural networks to remove the undesired effects of clinical variables or other speaker factors when assessing health disorders from speech.