학술논문

Historical Arabic Manuscripts Text Recognition Using Convolutional Neural Network
Document Type
Conference
Source
2020 6th Conference on Data Science and Machine Learning Applications (CDMA) CDMA Data Science and Machine Learning Applications (CDMA), 2020 6th Conference on. :37-42 Mar, 2020
Subject
Computing and Processing
Image segmentation
Text recognition
Feature extraction
Writing
Shape
Convolution
Neurons
Optical Character Recognition
Arabic manuscripts
CNN.
Language
Abstract
The Islamic heritage is rich of Arabic manuscripts that contain valuable knowledge of Islamic Sciences, such as Hadeethe, Tafseer and Akhidah. However, these manuscripts are hard to read and there is a need to convert them into a publishable form. Therefore, this paper proposes a method for recognizing the text in the images of these manuscripts and convert it into a readable text that can be copied and saved for further usage in other researches. The main steps of our algorithm are as follow: 1) enhancing the image (preprocessing); 2) dividing the manuscript image into lines and characters (segmentation);3) building the dataset of Arabic characters;4) recognizing the text (classification). In the classification stage, we apply Convolutional Neural Network CNN on three created datasets, and it provides an accuracy that ranges between 74.29% to 88.20%.