학술논문

Enhanced Named Entity Recognition algorithm for financial document verification
Document Type
Original Paper
Source
The Journal of Supercomputing: An International Journal of High-Performance Computer Design, Analysis, and Use. 79(17):19431-19451
Subject
Automatic document verification
Named Entity Recognition
Document summarization
Spell-checker
Natural language processing
Language
English
ISSN
0920-8542
1573-0484
Abstract
Many enterprise systems are document-intensive and require extensive manual verification. The verification process has challenge in terms of time and remaining bugs. A general automatic or semi-automatic document verification system would be useful. However, as the nature of the natural language, the context is an important factor. In this research, the target context is selected to be the financial documents, which have been highly interested recently. An automatic document verification model based on only entities (mostly faced within financial documents) was experimented. The summary report was verified with original documents, such that entities in the summary were searched for matching in the original documents. Verification process success was evaluated by comparison of the named entity algorithms in the literature. The special Kaggle data set ready for this purpose was used for entity matching from the summary within the original documents. The average document verification accuracy of named entity finding algorithms for only financial type documents was 85.36%, where the proposed entity recognition algorithm reached 88.80%. On the other hand, the average document verification time of the experimented algorithms and the developed algorithm is 2.43 and 2.48 s respectively. As a conclusion, when both the BERT-base-cased classification model and rule-based approaches are applied specific to the context, it enhances the entity verification process with an insignificant time cost. Consequently, even we used limited data and rules, it is seen that there exists opportunity to automatize the document verification process with the support of both the BERT-base-cased classification model and rule-based approaches.