학술논문

Enabling Generative AI to Produce SQL Statements: A Framework for the Auto- Generation of Knowledge Based on EBNF Context-Free Grammars
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:123543-123564 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Structured Query Language
Prototypes
Grammar
Artificial intelligence
Measurement
Fuzzing
Semantics
Generative adversarial networks
ANTLR4
ATN
automata
EBNF grammar
generative AI
parser generator
SQL
Language
ISSN
2169-3536
Abstract
The motivation of this paper is to be able to generate high-quality (Structured Query Language) SQL language sentences in terms of syntax and semantics so that they are intended to achieve a concrete predefined and well-known aim. For example, generating SQL sentences that are capable of detecting a cyber-attack from a set of metrics available in a database table. Two solutions are needed to achieve so, a tool that enables and performs the syntactically valid generation of SQL sentences and an (Artificial intelligence) AI algorithm able to guide the semantics of such generations to the achievement of the best sentences for the intended purpose. The main contribution of this manuscript is the first of these solutions. To be concrete, this paper proposes a tool to enable and generate syntactic-valid language sentences. The tool can deal with any language defined as an ANTLR4 EBNF (Extended Backus-Naur Form) grammar. The paper also provides a methodology to help achieve an EBNF grammar suitable for addressing concerns related to ambiguity and recursion as a direct result of the generation process. The paper further implements a prototype utilizing ANTLR4’s recognizer and its Augmented Transition Network for language generation using EBNF grammars. In-depth design and logic implementation are provided, showcasing areas of interest for AI integration. The achieved prototype showed an ability to easily generate syntactically valid SQL statements at various depths, with observable problems becoming more apparent during the exponential recursive growth. Our mitigation controls for such scenarios proved to be successful and were able to complete the recursion whilst also moving the push-down automata forward until query completion. Experimental validation was performed against a SQL EBNF grammar feeding the generated SQL statement into an SQL parser to validate the syntax.