학술논문

Vision Transformer and Its Application in Penguin Classification
Document Type
Conference
Author
Source
2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML) Image Processing, Computer Vision and Machine Learning (ICICML), 2022 International Conference on. :214-220 Oct, 2022
Subject
Computing and Processing
Robotics and Control Systems
Training
Scattering
Object detection
Transformers
Prediction algorithms
Feature extraction
Classification algorithms
Image classification
Vision Transformer
Computer Vision
Attention
Penguin
Language
Abstract
Attention, also known as the attention mechanism, is a resource allocation scheme to solve the problem of information overload in the abstract. Transformer is a Seq-to-Seq model, which can also be seen as composed of encoder and decoder, but it is not RNN. It is completely based on attention mechanism and full connection layer on large data sets and has higher accuracy than RNN. This paper also chooses one popular model Vision Transformer to classify different penguin’s genus. There are still exists incorrect classification after applying this transformer. The problem might be on the image’s type, for image which contains too many penguins and underwater image, the predict accuracy was reduced. To solve this problem, three measures might be useful, including applying target detection algorithm, applying special processing of underwater images and more datasets.