학술논문
Page Layout Analysis of Complex Document Images
Document Type
Conference
Source
2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) Computing Communication and Networking Technologies (ICCCNT), 2024 15th International Conference on. :1-5 Jun, 2024
Subject
Language
ISSN
2473-7674
Abstract
Page format evaluation of complex document pictures is the process of information on the shape of report pictures and extracting the visible content material inside them. General page format, text partitioning, and vicinity labeling, font identification. As an award, it may be used to enhance the functionality of record processing obligations, as well asal and optical character popularity. It provides automatic methods for parsing the layout of core components, such as text blocks, picspilt into files and tables, and mathematical formulas. The process of page layout evaluation retrieves actual relationships among visible items to understand the semantic content material of a document. A typical page layout analysis pipeline comprises low-level tasks, including image pre-processing and noise reduction, by mid-degree obligations such as textual content segmentation, place labeling, and taking location followed by the aid of font identity. Finally, the output we require is only created using higher-level duties such as table extraction and text monitoring. The synthesis of the two responsibilities allows for a greater understanding of page shape, enabling stepped-forward record retrieval and optical character recognition. Additionally, this can make documentation available in other zones, including summary records and page categorization.