학술논문

Code recognition by means of improved distance and tuned dictionary
Document Type
Conference
Source
2017 IEEE URUCON URUCON, 2017 IEEE. :1-4 Oct, 2017
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Dictionaries
Character recognition
Optical character recognition software
Databases
Text recognition
Neural networks
Intelligent Character Recognition
Tesseract
Language Processing
Open Source Software
Chatterbot
Language
Abstract
This paper presents a tuned Intelligent Code Recognition (ICR) model using a self-adapting dictionary, combined with a version of Levenshtein distance for checking word similarity. It is part of a broader project named PTAH (Procesamiento de Trámites con Asistente Hispanohablante), an intelligent chatterbot that uses this ICR to collect knowledge from documents. Due to its critical activity, it is mandatory to get the best result for the image to text conversion. The success in getting information determines the knowledge database quality and therefore the accuracy of the answers upon queries to the chatterbot. A blueprint of the global Project is included as well as the ICR, statistics progression obtained with the sequence of improvements applied to the ICR, and a final analysis of the failures.