학술논문

Neural Tree Decoder for Interpretation of Vision Transformers

Document Type

Periodical

Author

Source

IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 5(5):2067-2078 May, 2024

Subject

Computing and Processing
Visualization
Transformers
Decision trees
Decision making
Task analysis
Decoding
Computational modeling
explainable artificial intelligence
interpretable artificial intelligence
machine learning
visualization of decision and learning models

Language

ISSN

2691-4581

Abstract

In this study, we propose a novel vision transformer neural tree decoder (ViT-NeT) that is interpretable and highly accurate in terms of fine-grained visual categorization (FGVC). A ViT acts as a backbone, and to overcome the limitations of ViT, the output context image patch is fed to the proposed NeT. NeT aims to more accurately classify fine-grained objects using similar interclass correlations and different intra-class correlations. ViT-NeT can also describe decision-making processes and visually interpret the results through tree structures and prototypes. Because the proposed ViT-NeT is designed not only to improve FGVC classification performance, but also to provide human-friendly interpretation, it is effective in resolving the tradeoff between performance and interpretability. We compared the performance of ViT-NeT with other state-of-the-art (SoTA) methods using the widely applied FGVC benchmark datasets CUB-200-2011, Stanford Dogs, Stanford Cars, NABirds, and iNaturalist. The proposed method shows a promising quantitative and qualitative performance in comparison to previous SoTA methods as well as an excellent interpretability.

Online Access

Full Text (IEEE) Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송