학술논문

Address matching using machine learning methods: An application to register-based census.
Document Type
Article
Source
Statistical Journal of the IAOS. 2024, Vol. 40 Issue 1, p25-40. 16p.
Subject
*CENSUS
*STATISTICAL matching
*CLASSIFICATION algorithms
*STREET addresses
*STATISTICS
*MACHINE learning
Language
ISSN
1874-7655
Abstract
Today, most activities of the statistical offices need to be adapted to the modernization policies of the national statistical system. Therefore, the application of machine learning techniques is mandatory for the main activities of statistical centers. These include important issues such as coding business activities, address matching, prediction of response propensities, and many others. One of the common applications of machine learning methods in official statistics is to match a statistical address to a postal address, in order to establish a link between register-based census and traditional censuses with the aim of providing time series census information. Since there is no unique identifier to directly map the records from different databases, text-based approaches can be applied. In this paper, a novel application of machine learning will be investigated to integrate data sources of governmental records and census, employing text-based learning. Additionally, three new methods of machine learning classification algorithms are proposed. A simulation study has been performed to evaluate the robustness of methods in terms of the degree of duplication and purity of the texts. Due to the limitation of the R programming environment on big data sets, all programming has been successfully implemented on SAS (Statistical analysis system) software. [ABSTRACT FROM AUTHOR]