학술논문

Complete Task Learning from a Single Human Demonstration by Identifying Fundamental Sub-Tasks for Robot Manipulator Execution
Document Type
Conference
Source
2024 Tenth Indian Control Conference (ICC) Indian Control Conference (ICC), 2024 Tenth. :391-396 Dec, 2024
Subject
Aerospace
Power, Energy and Industry Applications
Robotics and Control Systems
Transportation
Metalearning
Service robots
Semantics
Training data
Streaming media
Transformers
Real-time systems
Trajectory
Reliability
Manipulator dynamics
Task learning
imitation learning
classification
video
robot manipulator
Language
Abstract
This work tackles the problem of teaching robot manipulators to carry out intricate tasks independently by observing human demonstrations. By breaking down the whole task into smaller basic sub-tasks, a reliable model is developed by associating human actions with equivalent robotic ones more easily so that manipulator actions can be programmed efficiently. To enable real-time sub-task level identification, vision transformer (ViViT), TimeSformer, and VideoMAE models are used separately as encoder architecture and trained using video data to anticipate the sub-task levels. These models are compared, and the accuracy aggregated on different tasks is 64.36% (ViViT), 71.26% (TimeSformer) and 81.03% (VideoMAE). The identified sub-tasks are executed by a robot manipulator using the trajectories learned through the dynamic movement primitives (DMP). Real-time experiments show that this approach greatly enhances the robot's ability to reliably and precisely reproduce complex tasks. The proposed solution also emphasizes how flexible the system is to various task modifications and how it can be used in multiple healthcare, home, and industrial settings. The novelty lies in the notion of finding semantic connections between primitive sequences from video data.