학술논문

Research on the Applicability of Benford’s Law in Chinese Texts
Document Type
Conference
Source
2020 2nd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM) AIAM Artificial Intelligence and Advanced Manufacture (AIAM), 2020 2nd International Conference on. :13-17 Oct, 2020
Subject
Computing and Processing
Statistical analysis
Information processing
Probability distribution
Entropy
Quality assessment
Artificial intelligence
Periodic structures
Benford’s law
Zipf’s law
relative entropy
corpus quality assessment
Language
Abstract
This paper aims to research the applicability of Benford’s Law in Chinese texts. Firstly, the Chinese corpus was collected and word segmentation was performed. The distributions of the first digit of frequency were calculated for words, low-frequency words and single characters respectively in Chinese texts, and the relative entropy (Kullback-Leibler distance) between the distributions and the general Benford’s law. Secondly, the parameter value range of the Generalized Benford’s law was researched, and in view of the limitation of Zipf’s law that is only applicable to large amounts of data, we carried out a statistical analysis of small-scale data. Then, the experimental analysis of the probability of the first digit of the word frequency of the single character data was carried out to explore the applicability of the Generalized Benford’s law for single澡character data. Finally, the applicability of Benford’s law was investigated for artificially modified corpus. The results show that the words and characters in Chinese texts conform to the Benford’s law, and Benford’s law overcomes the limitation of Zipf’s law on the size of the data sets, and the Generalized Benford’s law has the ability to discriminate the natural quality of the corpus, which has important practical significance for Chinese information processing.