학술논문

A novel framework to enhance the performance of training distributed deep neural networks.
Document Type
Article
Source
Intelligent Data Analysis. 2023, Vol. 27 Issue 3, p753-768. 16p.
Subject
*ARTIFICIAL neural networks
*CONVOLUTIONAL neural networks
*SYSTEMS theory
*DEEP learning
Language
ISSN
1088-467X
Abstract
There are many attempts to implement deep neural network (DNN) distributed training frameworks. In these attempts, Apache Spark was used to develop the frameworks. Each framework has its advantages and disadvantages and needs further improvements. In the process of using Apache Spark to implement distributed training systems, we ran into some obstacles that significantly affect the performance of the systems and programming thinking. This is the reason why we developed our own distributed training framework, called Distributed Deep Learning Framework (DDLF), which is completely independent of Apache Spark. Our proposed framework can overcome the obstacles and is highly scalable. DDLF helps to develop applications that train DNN in a distributed environment (referred to as distributed training) in a simple, natural, and flexible way. In this paper, we will analyze the obstacles when implementing a distributed training system on Apache Spark and present solutions to overcome them in DDLF. We also present the features of DDLF and how to implement a distributed DNN training application on this framework. In addition, we conduct experiments by training a Convolutional Neural Network (CNN) model with datasets MNIST and CIFAR-10 in Apache Spark cluster and DDLF cluster to demonstrate the flexibility and effectiveness of DDLF. [ABSTRACT FROM AUTHOR]