학술논문

Empowering lightweight video transformer via the kernel learning

Document Type

article

Author

Source

Electronics Letters, Vol 60, Iss 9, Pp n/a-n/a (2024)

Subject

artificial intelligence
multimedia computing
video signal processing
Electrical engineering. Electronics. Nuclear engineering
TK1-9971

Language

English

ISSN

1350-911X
0013-5194

Abstract

Abstract Video transformers achieve superior performance in video recognition. Despite the recent advances in video transformers, they still require substantial computation and memory resources. To cater for the computation efficiency, a kernel‐based video transformer is proposed, including: (1) a new formulation of the video transformer via the kernel learning is presented to better understand the individual components of it; (2) a lightweight Kernel‐based spatial–temporal multi‐head self‐attention block is explored to learn the compact joint spatial–temporal video feature; (3) an adaptive‐score position embedding method is conducted to promote the flexibility of video transformer. Experimental results on several action recognition datasets demonstrate the effectiveness of the proposed method. Only pretrained on ImageNet‐1K, the method achieves the preferable balance between computation and accuracy, while requiring 7× fewer parameters and 13× fewer floating point operations than other comparable methods.

Online Access

Full Text (Gale Academic Onefile) Open Access (DOAJ) Open Access (Wiley) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송