학술논문

Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
Document Type
article
Source
IJAIN (International Journal of Advances in Intelligent Informatics), Vol 8, Iss 2, Pp 135-150 (2022)
Subject
covid-19
computed tomography
clustering
transfer learning
ensemble learning
Electronic computers. Computer science
QA75.5-76.95
Language
English
ISSN
2442-6571
2548-3161
Abstract
The paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patients’ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suffer from data leakage arising from data organization where the patients and their images are not grouped. Each patient is represented with several scans. It can provide misleading results as data of the same individual may occur in both training and test sets. Furthermore, only one paper proposed ensemble learning utilizing as base models VGG-16, ResNet50, and Xception. Therefore, we proposed and experimented with the following strategy to mitigate the found risks of bias: data standardization and normalization to achieve proper contrast and resolution; k-means and group shuffle split to avoid data leakage; augmentation and ensemble transfer learning to deal with limited sample size and over-fitting. Compared with the earlier proposed ensemble approach, the current one stacks VGG-16, Densenet-201, and Inception v3, achieving higher accuracy (99.3 %), second in the related work, and most significantly, it applies augmentation and clustering analysis to avoid overestimation. In contrast, the paper also presented critical metrics in the medical domain: negative prediction value (99.55%), false positive rate (0.89%), false negative rate (0.42%), and false discovery rate (0.83%). The strategy has two main advantages: reducing data pitfalls and decreasing generalization error. It can serve as a baseline to increase the performance quality and mitigate the risk of bias in the field.