학술논문

Selection of Appropriate Symbolic Regression Models Using Statistical and Dynamic System Criteria: Example of Waste Gasification
Document Type
article
Source
Axioms, Vol 11, Iss 9, p 463 (2022)
Subject
symbolic regression
Mean Square Error
Pearson Correlation Coefficient
oscillations in solutions
dynamic system criteria
waste gasification
Mathematics
QA1-939
Language
English
ISSN
2075-1680
Abstract
In this paper, we analyze the interpretable models from real gasification datasets of the project “Centre for Energy and Environmental Technologies” (CEET) discovered by symbolic regression. To evaluate CEET models based on input data, two different statistical metrics to quantify their accuracy are usually used: Mean Square Error (MSE) and the Pearson Correlation Coefficient (PCC). However, if the testing points and the points used to construct the models are not chosen randomly from the continuum of the input variable, but instead from the limited number of discrete input points, the behavior of the model between such points very possibly will not fit well the physical essence of the modelled phenomenon. For example, the developed model can have unexpected oscillatory tendencies between the used points, while the usually used statistical metrics cannot detect these anomalies. However, using dynamic system criteria in addition to statistical metrics, such suspicious models that do fit well-expected behavior can be automatically detected and abandoned. This communication will show the universal method based on dynamic system criteria which can detect suitable models among all those which have good properties following statistical metrics. The dynamic system criteria measure the complexity of the candidate models using approximate and sample entropy. The examples are given for waste gasification where the output data (percentage of each particular gas in the produced mixture) is given only for six values of the input data (temperature in the chamber in which the process takes place). In such cases instead, to produce expected simple spline-like curves, artificial intelligence tools can produce inappropriate oscillatory curves with sharp picks due to the known tendency of symbolic regression to produce overfitted and relatively more complex models if the nature of the physical model is simple.