학술논문

Reproducibility and Performance of Deep Learning Applications for Cancer Detection in Pathological Images
Document Type
Conference
Source
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Cluster, Cloud and Grid Computing (CCGRID), 2019 19th IEEE/ACM International Symposium on. :621-630 May, 2019
Subject
Computing and Processing
Reproducibility
Performance
Deep Learning
Machine Learning
Container
Docker
Filesystem in Userspace
CUDA
FAIR Principles
Common Workflow Language
Reproducible Experiment Descriptions
Curious Containers
Language
Abstract
Convolutional Neural Networks (CNN) are used for automatic cancer detection in pathological images. These data-driven experiments are difficult to reproduce, because the CNNs may require CUDA-enabled Nvidia GPUs for acceleration and training is often performed on a large dataset stored on a researcher's computer, inaccessible to others. We introduce the RED file format for reproducible experiment description, where executable programs are packaged and referenced as Docker container images. Data inputs and outputs are described as network resources using standard transmission and authentication protocols instead of local file paths. Following the FAIR guiding principles, the RED format is based on and compatible with the established Common Workflow Language specification. RED files are interpreted by the accompanying Curious Containers (CC) software. Arbitrarily large datasets are mounted inside containers via FUSE network filesystems like SSHFS. SSHFS is compared to NFS and a local SSD in artificial benchmarks and in the context of a CNN training scenario, where SSHFS introduces a performance decrease by a factor of 1.8. We are convinced that RED can greatly improve the reproducibility of deep learning workloads and data-driven experiments. This is in particular important in clinical scenarios where the result of an analysis may contribute to a patient's treatment.