학술논문

Graph Attention Neural Network Distributed Model Training
Document Type
Conference
Source
2022 IEEE World AI IoT Congress (AIIoT) AI IoT Congress (AIIoT), 2022 IEEE World. :447-452 Jun, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Training
Runtime
Computational modeling
Neural networks
Distributed databases
Data models
Stability analysis
natural language processing
NLP
machine learning
distributed machine learning
distributed systems
big data
pytorch
Language
Abstract
The scale of neural language models has been increasing significantly over recent years. As a result, the time complexity of training larger language models and resource utilization has been increasing at a higher rate as well. In this research, we propose a distributed implementation of a Graph Attention Neural Network model with 120 million parameters and train it on a cluster of eight GPUs. We demonstrate three times speedup in model training while keeping the stability of accuracy and loss rates during training and testing compared to single GPU instance training.