학술논문

A Large Language Model approach to SQL-to-Text Generation
Document Type
Conference
Source
2024 IEEE International Conference on Consumer Electronics (ICCE) Consumer Electronics (ICCE), 2024 IEEE International Conference on. :1-4 Jan, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
General Topics for Engineers
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Signal Processing and Analysis
Transportation
Structured Query Language
Codes
Databases
Chatbots
Task analysis
Consumer electronics
Deep Learning
Natural Language Processing
SQL-to-Text
Pre-trained Models
Large Language Models
Language
ISSN
2158-4001
Abstract
Generating relevant explanations given an structured code representation, such as SQL, is a challenging task. Tackling the SQL-to-text, more specifically the SQL-explanation problem, benefits both non-technical and technical users. Automatic explanations written in human language can facilitate the understanding of the query’s logical structure and it also helps developers to better document and learn SQL code. The approaches for this niche are diverse. Some of them involve sequence-to-sequence models and others utilize graph-to-sequence models to generate explanations. However, considering the latest advances in Large Language Models (LLMs) and the relatively little attention in SQL-to-text problem, we investigate a new generative approach based on LLMs to infer the logical structure about the query, including columns, tables and relations. We categorize our research on SQL-explanation as a subtask of SQL-to-text to differ from the translation of SQL code into natural language questions. Experiments were conducted with the open-source Falcon LLM and compared with T5 LLM and Graph2Seq models. The results show that Falcon outperforms previous models achieving 70% of accuracy with human evaluation on Spider dataset and it achieves competitive 75% accuracy with human evaluation on WikiSQL dataset.