학술논문

VSAS: Decision Transformer-Based On-Demand Volumetric Video Streaming With Passive Frame Dropping
Document Type
Periodical
Source
IEEE Internet of Things Journal IEEE Internet Things J. Internet of Things Journal, IEEE. 11(8):13752-13767 Apr, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Streaming media
Transform coding
Three-dimensional displays
Bit rate
Transformers
Encoding
Adaptation models
Quality of Experiences (QoEs)
reinforcement learning
volumetric video streaming
Language
ISSN
2327-4662
2372-2541
Abstract
Volumetric video is becoming a popular application among various multimedia services, which is envisioned as a fundamental technology for VR, AR, and the emerging metaverse. The commodity RGB-D cameras provide an affordable solution for volumetric video capture, and the VR headset allows immersive and interactive display. From the networking perspective, the primary challenge lies in the smooth and high-quality transmission over the Internet, given the enormous data volume and limited bandwidth. MPEG V-PCC standard has stood out recently as a promising compression and streaming solution that can effectively reduce video size while maintaining high-visual quality. Since MPEG V-PCC is largely backward compatible with the 2-D video compression standard like H.264/AVC, it is natural to use DASH, the most widely used 2-D streaming framework, to stream it. We first propose an integrated framework based on DASH to support MPEG V-PCC Internet streaming. During this, we faced three challenges. First, the lack of a rate-distortion model for MPEG V-PCC encoding parameters. Second, the need for a new bitrate adaptation controller that not only considers the rate of the chunks but also chooses the chunks with proper frame rate. We align the decision transformer to this problem, which expands the success of transformer-based models in natural language processing to the decision problems. Finally, to solve the stalling time issue inherited from DASH, we use a frame-dropping mechanism to eliminate the stalling in DASH playback. Our evaluations show that VSAS achieves an average acrlong QoE improvement of $1.67\times $ over a range of network conditions.