학술논문
VTAHMER: Vision-Text Alignment Based Handwritten Mathematical Expression Recognition
Document Type
Conference
Author
Source
2023 China Automation Congress (CAC) Automation Congress (CAC), 2023 China. :8497-8502 Nov, 2023
Subject
Language
ISSN
2688-0938
Abstract
Handwritten mathematical expression recognition(HMER) has attracted considerable attention in pattern recognition community. However, it remains challenging due to distinct personal styles and qualities in real application. To this end, we propose an encoder-decoder method, namely VTAHMER, using scale attention and cross-modal learning in this paper. This method improves the efficiency and effectiveness of multi-scale mathematical expression symbol recognition by using an adaptive convolutional kernel weighting approach. In addition, the invariant features from both handwritten expression images and their corresponding labels represented by LaTeX form are learned respectively. Specifically, the image encoder with different kernel sizes merge attention information to accommodate varying receptive fields. We also design a text encoder for LaTeX formulas, contrastively learning them along with the image encoder. This cultivates the latent semantic information between vision and text domains, which is benefit to downstream recognition. Experimental results show our model achieves better recognition rates with 61.4%, 61.8% and 64.8% on CROHME 2014/2016/2019 datasets, respectively.