학술논문

Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A Multi-Agent Reinforcement Learning Approach
Document Type
Conference
Source
GLOBECOM 2023 - 2023 IEEE Global Communications Conference Global Communications Conference, GLOBECOM 2023 - 2023 IEEE. :3500-3505 Dec, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Engineering Profession
General Topics for Engineers
Power, Energy and Industry Applications
Signal Processing and Analysis
Training
Data centers
Costs
Processor scheduling
Reinforcement learning
Carbon dioxide
Scheduling
AI-generated content
Job scheduling
Green cloud computing
Multi-agent reinforcement learning
Language
ISSN
2576-6813
Abstract
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption. Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy and address the issue of workload imbalance. To tackle the challenge of multi-objective scheduling, i.e., maximizing GPU utilization while reducing operational costs, we propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities. Compared with other algorithms, our proposed method improves the system utility by up to 28.6% attributable to higher GPU utilization, lower energy cost, and less carbon emission.