학술논문

Automatic Extractive Text Summarization of Hindi Text using Deep learning approach
Document Type
Conference
Source
2023 International Conference on Computer Communication and Informatics (ICCCI) Computer Communication and Informatics (ICCCI), 2023 International Conference on. :1-8 Jan, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Deep learning
Training
Measurement
Fuzzy logic
Databases
Semantic segmentation
Feature extraction
Automatic Text Summarization
Genetic Algorithm
real coded genetic algorithm
TF-IDF
Language
ISSN
2473-7577
Abstract
In order to swiftly sift through the vast amounts of textual material available online, Automatic Text Summarization (ATS) is in great demand. In this research, the Real Coded Genetic Algorithm (RCGA) is used to the Hindi movie reviews available on the Kaggle dataset in order to suggest an ATS approach for the Hindi language. The approach consists of five stages: pre-processing, extraction of features, processing, paragraph ranking, and summary output. In a rigorous research on many feature sets, sentence similarities and semantic segmentation characteristics are merged with some other features to produce the evaluation metrics. Different compression rates are evaluated in order to extract the sentences with the greatest scores as the corpus summaries. The ATS extractive approach provides a summary reduction of 65% when compared to current summarization methods. A text summary tool condenses the text and shows the user only the crucial information. The significant sentences are chosen via the extraction approach based on a theme approach. Hindi stop-words were eliminated before choosing thematic terms, and the stemming procedure was used to find the sentences’ root words. Stop-word removal removes useless words from the input material, while stemming helps group together words that have the same numerical term. The method relies on how frequently the radix of theme terms appear in the sentences to determine a score for them. Sentences having the highest ratings are prioritized for inclusion in the summary. In order to make the produced summary more similar to human-generated summary, it is then further treated based on the elimination of superfluous terms from the selected summary sentences.