학술논문

VTAHMER: Vision-Text Alignment Based Handwritten Mathematical Expression Recognition
Document Type
Conference
Source
2023 China Automation Congress (CAC) Automation Congress (CAC), 2023 China. :8497-8502 Nov, 2023
Subject
Aerospace
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Handwriting recognition
Visualization
Text recognition
Semantics
Symbols
Feature extraction
Transformers
handwritten mathematical expression recognition
Multi-scale symbol recognition
Vision-Text Alignment
Language
ISSN
2688-0938
Abstract
Handwritten mathematical expression recognition(HMER) has attracted considerable attention in pattern recognition community. However, it remains challenging due to distinct personal styles and qualities in real application. To this end, we propose an encoder-decoder method, namely VTAHMER, using scale attention and cross-modal learning in this paper. This method improves the efficiency and effectiveness of multi-scale mathematical expression symbol recognition by using an adaptive convolutional kernel weighting approach. In addition, the invariant features from both handwritten expression images and their corresponding labels represented by LaTeX form are learned respectively. Specifically, the image encoder with different kernel sizes merge attention information to accommodate varying receptive fields. We also design a text encoder for LaTeX formulas, contrastively learning them along with the image encoder. This cultivates the latent semantic information between vision and text domains, which is benefit to downstream recognition. Experimental results show our model achieves better recognition rates with 61.4%, 61.8% and 64.8% on CROHME 2014/2016/2019 datasets, respectively.