학술논문

V-Recover: Virtual Machine Recovery When Live Migration Fails
Document Type
Periodical
Source
IEEE Transactions on Cloud Computing IEEE Trans. Cloud Comput. Cloud Computing, IEEE Transactions on. 11(3):3289-3300 Sep, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Checkpointing
Cloud computing
Servers
Virtual machining
Virtualization
Resilience
Prototypes
Virtual machine
live migration
fault tolerance
Language
ISSN
2168-7161
2372-0018
Abstract
Live migration is a critical technology used in cloud infrastructures to transfer running virtual machines (VMs). When live migration fails, as it often does, it is critical that any VMs in transit are not lost. There are two primary live migration techniques – pre-copy and post-copy. Pre-copy transfers a VM's memory to the destination before its virtual CPUs are transferred, whereas post-copy does the reverse. Both pre-copy and post-copy will lose the VM if the source machine fails during migration. Additionally, post-copy can lose the VM if the destination machine or network fail since the VM's memory and execution state are split across the source and destination machines. We present V-Recover, an approach to recover a VM when the source, destination, or network fails during live migration. V-Recover consists of two techniques: (1) a forward incremental checkpointing (FIC) mechanism to handle source machine failure during both pre-copy and post-copy, and (2) a reverse incremental checkpointing (RIC) mechanism to handle destination or network failure during post-copy. We present the design, implementation, and evaluation of V-Recover in the KVM/QEMU virtualization platform. Our evaluations show that V-Recover effectively recovers a VM upon migration failure with acceptable overheads on migration metrics and application performance.