학술논문

Vocal-Feature-Based Classification of Post-Laryngectomy Patients for Rehabilitation Monitoring
Document Type
Periodical
Source
IEEE Transactions on Instrumentation and Measurement IEEE Trans. Instrum. Meas. Instrumentation and Measurement, IEEE Transactions on. 72:1-9 2023
Subject
Power, Energy and Industry Applications
Components, Circuits, Devices and Systems
Feature extraction
Harmonic analysis
Task analysis
Indexes
Surgery
Stability criteria
Reliability
Cepstral analysis
classification models
laryngectomy
objective evaluation
rehabilitation
voice analysis
voice assessment
Language
ISSN
0018-9456
1557-9662
Abstract
This article deals with the analysis of substitution voices in patients who underwent partial laryngectomy for laryngeal cancer, with the aim of identifying a reliable methodology to provide an objective evaluation of post-intervention phonatory impairment and the effectiveness of rehabilitation therapies. The investigated dataset includes 85 patients who underwent type I open partial horizontal laryngectomy (OPHL, 22 subjects), type II OPHL (32 subjects), and type III OPHL (31 subjects). The available vocal material (reading task and sustained vowel) was preprocessed to remove nonharmonic frames from the patients’ records using two different algorithms. After this preliminary step, a series of features that belong to time, spectral, and cepstral domains were extracted from the selected harmonic frames. Then, two different comparisons were made between the classes OPHL-I versus OPHL-II + III and the classes OPHL-II + III ( $I< 5$ ) versus OPHL-II + III ( $I\geq 5$ ), where the index $I$ (Intelligibility) of the auditory perceptual scale intelligibility, noise, fluency, and voicing (INFVo) was assessed during a preliminary evaluation. Two different feature-selection techniques, which are based on the comparison among the probability distributions of the extracted features and the classification performance of a logistic regression (LR) model, identified the features with the best discrimination capabilities, which are harmonic-to-noise ratio (HNR), fundamental frequency, spectral kurtosis, spectral entropy, and mel-frequency cepstral coefficients (MFCCs). The best classification accuracy of 96.5% (fivefold cross validation) was obtained in the comparison OPHL-I versus OPHL-II + III using an LR model that was trained using the 5° and 95° percentile of the fundamental frequency and the 95° percentile of the spectral entropy extracted from the reading task.