학술논문

Vision Transformer and Its Application in Penguin Classification

Document Type

Conference

Author

Source

2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML) Image Processing, Computer Vision and Machine Learning (ICICML), 2022 International Conference on. :214-220 Oct, 2022

Subject

Computing and Processing
Robotics and Control Systems
Training
Scattering
Object detection
Transformers
Prediction algorithms
Feature extraction
Classification algorithms
Image classification
Vision Transformer
Computer Vision
Attention
Penguin

Language

Abstract

Attention, also known as the attention mechanism, is a resource allocation scheme to solve the problem of information overload in the abstract. Transformer is a Seq-to-Seq model, which can also be seen as composed of encoder and decoder, but it is not RNN. It is completely based on attention mechanism and full connection layer on large data sets and has higher accuracy than RNN. This paper also chooses one popular model Vision Transformer to classify different penguin’s genus. There are still exists incorrect classification after applying this transformer. The problem might be on the image’s type, for image which contains too many penguins and underwater image, the predict accuracy was reduced. To solve this problem, three measures might be useful, including applying target detection algorithm, applying special processing of underwater images and more datasets.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송