학술논문
Stochastic human motion prediction using a quantized conditional diffusion model
Document Type
Article
Author
Source
In Knowledge-Based Systems 30 January 2025 309
Subject
Language
ISSN
0950-7051
Abstract
Human motion prediction is a fundamental task in computer vision, aiming to forecast future human poses based on observed motion sequences. Existing deterministic methods generate a single future motion sequence, neglecting the inherent stochasticity and diversity of human behaviors. To address this limitation, we propose a novel two-stage stochastic human motion prediction framework, termed the Quantized Conditional Diffusion Model (QCDM), which combines a Discrete Motion Quantization Module and a Conditional Motion Generation Module. Specifically, we first design a discrete motion quantization module that leverages Graph Convolutional Networks (GCNs) and one-dimensional temporal convolutions to encode motion sequences into continuous latent representations. These representations are then quantized into discrete latent variables using a learnable codebook. A decoder reconstructs the motion sequence from these discrete variables, preserving key motion patterns while eliminating redundancies. Next, we develop a conditional motion generation module that integrates GCNs and Transformers for denoising spatio-temporal features. The diffusion process iteratively refines noisy motion data by reversing a gradual noising procedure, modeling the distribution of plausible future motions. Action category information and observed historical motion segments are incorporated as conditions into the denoising process, enabling controllable generation of specific motions. Additionally, we introduce a diversity enhancement strategy by penalizing overly similar samples. This encourages the model to explore a wider range of plausible motions and thereby improving the diversity and richness of the prediction results. Extensive experiments demonstrate that the QCDM framework outperforms state-of-the-art methods in stochastic human motion prediction tasks, offering both accuracy and diversity in generated motion sequences.