Researchers at IBM address the difficulty of extracting valuable insights from large databases, especially in businesses. The massive volume and variety of data make it difficult for employees to locate the necessary information. Writing SQL code required to retrieve data across multiple schemas and tables can be complex. This limitation hampers the ability of businesses to make strategic decisions by fully leveraging their data.
Current methods for querying databases rely heavily on SQL, the dominant language for database interactions. However, SQL proficiency is typically limited to a small group of data professionals within an organization, which restricts broader access to data insights. Researchers at IBM proposed a Granite code model, ExSL+granite-20b-code, to simplify data analysis by enabling generative AI to write SQL queries from natural language questions. The proposed model achieved top performance on the BIRD benchmark, which measures the effectiveness of AI models in translating natural language into SQL.
ExSL+granite-20b-code incorporates an extractive schema-linking technique to understand database organization and retrieve relevant data tables and columns. The researchers tuned three versions of the Granite 20B model to optimize the process of identifying pertinent data columns, establishing linkages between data values, and generating accurate SQL code.
IBM’s approach to improving text-to-SQL generation involves a three-step process: schema linking, content linking, and SQL code generation. The schema linking step matches keywords in the question to relevant data tables and columns. An extractive method speeds up this process significantly. In the content linking step, sub-tables are converted into string representations and passed to another model instance trained to generate multiple pieces of SQL code. This model compares columns with specific values relevant to the query. Finally, the third instance of the Granite model generates and selects the best SQL queries by analyzing execution results.
IBM’s solution stood out in the BIRD benchmark for both accuracy and execution speed. It achieved an 80 in code execution speed, just below the 90 earned by human engineers, while other AI systems scored 65. The extractive method for schema linking and a generative approach for content linking were key factors in this performance. Despite the system answering only 68% of questions correctly compared to human engineers’ 93%, its performance represents a significant step forward in automating SQL generation.
In conclusion, IBM has made significant advancements in leveraging generative AI to simplify data querying processes for businesses. IBM’s text-to-SQL generator presents a promising solution by addressing the need for SQL proficiency in businesses and enabling broader access to data insights. Despite the system answering only 68% of questions correctly compared to human engineers’ 93%, its performance represents a significant step forward in automating SQL generation.
The post IBM Researchers Propose ExSL+granite-20b-code: A Granite Code Model to Simplify Data Analysis by Enabling Generative AI to Write SQL Queries from Natural Language Questions appeared first on MarkTechPost.
#AIShorts #AITool #Applications #ArtificialIntelligence #EditorsPick #Staff #TechNews #Technology [Source: AI Techpark]