Generative AI has opened up a lot of potential in the field of AI. We are seeing numerous uses, including text generation, code generation, summarization, translation, chatbots, and more. One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing with complex technical code, business users and data analysts can ask questions related to data and insights in plain language. The primary goal is to automatically generate SQL queries from natural language text. To do this, the text input is transformed into a structured representation, and from this representation, a SQL query that can be used to access a database is created.
In this post, we provide an introduction to text to SQL (Text2SQL) and explore use cases, challenges, design patterns, and best practices. Specifically, we discuss the following:
Today, a large amount of data is available in traditional data analytics, data warehousing, and databases, which may be not easy to query or understand for the majority of organization members. The primary goal of Text2SQL is to make querying databases more accessible to non-technical users, who can provide their queries in natural language.
NLP SQL enables business users to analyze data and get answers by typing or speaking questions in natural language, such as the following:
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) via a single API, enabling to easily build and scale Gen AI applications. It can be leveraged to generate SQL queries based on questions similar to the ones listed above and query organizational structured data and generate natural language responses from the query response data.
Text-to-SQL systems involve several stages to convert natural language queries into runnable SQL:
One remarkable capability of Large Language Models (LLMs) is generation of code, including Structured Query Language (SQL) for databases. These LLMs can be leveraged to understand the natural language question and generate a corresponding SQL query as an output. The LLMs will benefit by adopting in-context learning and fine-tuning settings as more data is provided.
The following diagram illustrates a basic Text2SQL flow.
The prompt is crucial when using LLMs to translate natural language into SQL queries, and there are several important considerations for prompt engineering.
Effective prompt engineering is key to developing natural language to SQL systems. Clear, straightforward prompts provide better instructions for the language model. Providing context that the user is requesting a SQL query along with relevant database schema details enables the model to translate the intent accurately. Including a few annotated examples of natural language prompts and corresponding SQL queries helps guide the model to produce syntax-compliant output. Additionally, incorporating Retrieval Augmented Generation (RAG), where the model retrieves similar examples during processing, further improves the mapping accuracy. Well-designed prompts that give the model sufficient instruction, context, examples, and retrieval augmentation are crucial for reliably translating natural language into SQL queries.
The following is an example of a baseline prompt with code representation of the database from the whitepaper Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.
As illustrated in this example, prompt-based few-shot learning provides the model with a handful of annotated examples in the prompt itself. This demonstrates the target mapping between natural language and SQL for the model. Typically, the prompt would contain around 2–3 pairs showing a natural language query and the equivalent SQL statement. These few examples guide the model to generate syntax-compliant SQL queries from natural language without requiring extensive training data.
When building natural language to SQL systems, we often get into the discussion of if fine-tuning the model is the right technique or if effective prompt engineering is the way to go. Both approaches could be considered and selected based on the right set of requirements:
An effective approach for text-to-SQL models is to first start with a baseline LLM without any task-specific fine-tuning. Well-crafted prompts can then be used to adapt and drive the base model to handle the text-to-SQL mapping. This prompt engineering allows you to develop the capability without needing to do fine-tuning. If prompt engineering on the base model doesn’t achieve sufficient accuracy, fine-tuning on a small set of text-SQL examples can then be explored along with further prompt engineering.
The combination of fine-tuning and prompt engineering may be required if prompt engineering on the raw pre-trained model alone doesn’t meet requirements. However, it’s best to initially attempt prompt engineering without fine-tuning, because this allows rapid iteration without data collection. If this fails to provide adequate performance, fine-tuning alongside prompt engineering is a viable next step. This overall approach maximizes efficiency while still allowing customization if purely prompt-based methods are insufficient.
Optimization and best practices are essential for enhancing effectiveness and ensuring resources are used optimally and the right results are achieved in the best way possible. The techniques help in improving performance, controlling costs, and achieving a better-quality outcome.
When developing text-to-SQL systems using LLMs, optimization techniques can improve performance and efficiency. The following are some key areas to consider:
By applying optimization best practices like caching, monitoring, materialized views, scheduled refreshing, and a central catalog, you can significantly improve the performance and efficiency of text-to-SQL systems using LLMs.
Let’s look at some architecture patterns that can be implemented for a text to SQL workflow.
The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering.
In this pattern, the user creates prompt-based few-shot learning that provides the model with annotated examples in the prompt itself, which includes the table and schema details and some sample queries with its results. The LLM uses the provided prompt to return back the AI-generated SQL, which is validated and then run against the database to get the results. This is the most straightforward pattern to get started using prompt engineering. For this, you can use Amazon Bedrock or foundation models in Amazon SageMaker JumpStart.
In this pattern, the user creates a prompt-based few-shot learning that provides the model with annotated examples in the prompt itself, which includes the table and schema details and some sample queries with its results. The LLM uses the provided prompt to return back the AI generated SQL which is validated and run against the database to get the results. This is the most straightforward pattern to get started using prompt engineering. For this, you can use Amazon Bedrock which is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI or JumpStart Foundation Models which offers state-of-the-art foundation models for use cases such as content writing, code generation, question answering, copywriting, summarization, classification, information retrieval, and more
The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering and fine-tuning.
This flow is similar to the previous pattern, which mostly relies on prompt engineering, but with an additional flow of fine-tuning on the domain-specific dataset. The fine-tuned LLM is used to generate the SQL queries with minimal in-context value for the prompt. For this, you can use SageMaker JumpStart to fine-tune an LLM on a domain-specific dataset in the same way you would train and deploy any model on Amazon SageMaker.
The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering and RAG.
In this pattern, we use Retrieval Augmented Generation using vector embeddings stores, like Amazon Titan Embeddings or Cohere Embed, on Amazon Bedrock from a central data catalog, like AWS Glue Data Catalog, of databases within an organization. The vector embeddings are stored in vector databases like Vector Engine for Amazon OpenSearch Serverless, Amazon Relational Database Service (Amazon RDS) for PostgreSQL with the pgvector extension, or Amazon Kendra. LLMs use the vector embeddings to select the right database, tables, and columns from tables faster when creating SQL queries. Using RAG is helpful when data and relevant information that need to be retrieved by LLMs are stored in multiple separate database systems and the LLM needs to be able to search or query data from all these different systems. This is where providing vector embeddings of a centralized or unified data catalog to the LLMs results in more accurate and comprehensive information returned by the LLMs.
In this post, we discussed how we can generate value from enterprise data using natural language to SQL generation. We looked into key components, optimization, and best practices. We also learned architecture patterns from basic prompt engineering to fine-tuning and RAG. To learn more, refer to Amazon Bedrock to easily build and scale generative AI applications with foundation models
Whether a company begins with a proof-of-concept or live deployment, they should start small, test…
Digital tools are not always superior. Here are some WIRED-tested agendas and notebooks to keep…
Machine learning (ML) models are built upon data.
Editor’s note: This is the second post in a series that explores a range of…
David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*,…
Qualcomm did not violate a license with Arm when it acquired Nuvia for $1.4 billion,…