학술논문

Research on Offline Classification and Counting Algorithm of Long Fitness Video
Document Type
Conference
Source
2022 7th International Conference on Image, Vision and Computing (ICIVC) Image, Vision and Computing (ICIVC), 2022 7th International Conference on. :410-417 Jul, 2022
Subject
Computing and Processing
Deep learning
Adaptation models
Convolution
Motion segmentation
Computational modeling
Wearable computers
Feature extraction
Action Recognition
Action Counting
Temporal Action Detection
Language
Abstract
With the holding of sports events such as the Winter Olympics, "fitness" has once again become a key topic of widespread concern. Scientific fitness has become a new requirement when people engage in fitness activities today. Scientific and efficient fitness guidance is often inseparable from the support of personal exercise data. At present, the motion data quantification method for motion recognition and counting through wearable devices such as sensors has problems such as poor convenience and poor visualization. This paper uses deep learning methods to study motion classification and counting, which makes up for the shortcomings of traditional quantification methods. The purpose of this paper is to identify and count five common types of exercises in long videos: squats, sit-ups, push-ups, pull-ups, and jumping jacks. The main work and innovations are as follows: (1) This paper proposes an algorithm for motion classification and counting based on long videos. The algorithm first uses the temporal action detection model MPGCN proposed in this paper to locate and classify valid segments of long videos, and then uses the ATRepNet motion counting model proposed in this paper to count the cropped valid segments. (2) The large difference between the action duration and the action amplitude in the long video leads to the deterioration of the effect of the time-series action detection model. In view of this, this paper designs a MS multi-scale graph convolution layer which is integrated into the PGCN model for graph convolution multi-scale information fusion, and the MPGCN model is proposed. Finally, the mean Average Precision (mAP) of the localization and classification of dual-stream features in the public dataset is improved by 1.2%. (3) The complex video background in this scene leads to poor counting accuracy. In response to this, this paper improves the hybrid attention mechanism and integrates it into the counting model RepNet, and proposes the ATRepNet model. Finally, on the self-made test dataset, the counting accuracy is improved by 1.5%.