ML 15042 agent architecture
Large language model (LLM) agents are programs that extend the capabilities of standalone LLMs with 1) access to external tools (APIs, functions, webhooks, plugins, and so on), and 2) the ability to plan and execute tasks in a self-directed fashion. Often, LLMs need to interact with other software, databases, or APIs to accomplish complex tasks. For example, an administrative chatbot that schedules meetings would require access to employees’ calendars and email. With access to tools, LLM agents can become more powerful—at the cost of additional complexity.
In this post, we introduce LLM agents and demonstrate how to build and deploy an e-commerce LLM agent using Amazon SageMaker JumpStart and AWS Lambda. The agent will use tools to provide new capabilities, such as answering questions about returns (“Is my return rtn001
processed?”) and providing updates about orders (“Could you tell me if order 123456
has shipped?”). These new capabilities require LLMs to fetch data from multiple data sources (orders
, returns
) and perform retrieval augmented generation (RAG).
To power the LLM agent, we use a Flan-UL2
model deployed as a SageMaker endpoint and use data retrieval tools built with AWS Lambda. The agent can subsequently be integrated with Amazon Lex and used as a chatbot inside websites or AWS Connect. We conclude the post with items to consider before deploying LLM agents to production. For a fully managed experience for building LLM agents, AWS also provides the agents for Amazon Bedrock feature (in preview).
LLM agents are programs that use LLMs to decide when and how to use tools as necessary to complete complex tasks. With tools and task planning abilities, LLM agents can interact with outside systems and overcome traditional limitations of LLMs, such as knowledge cutoffs, hallucinations, and imprecise calculations. Tools can take a variety of forms, such as API calls, Python functions, or webhook-based plugins. For example, an LLM can use a “retrieval plugin” to fetch relevant context and perform RAG.
So what does it mean for an LLM to pick tools and plan tasks? There are numerous approaches (such as ReAct, MRKL, Toolformer, HuggingGPT, and Transformer Agents) to using LLMs with tools, and advancements are happening rapidly. But one simple way is to prompt an LLM with a list of tools and ask it to determine 1) if a tool is needed to satisfy the user query, and if so, 2) select the appropriate tool. Such a prompt typically looks like the following example and may include few-shot examples to improve the LLM’s reliability in picking the right tool.
More complex approaches involve using a specialized LLM that can directly decode “API calls” or “tool use,” such as GorillaLLM. Such finetuned LLMs are trained on API specification datasets to recognize and predict API calls based on instruction. Often, these LLMs require some metadata about available tools (descriptions, yaml, or JSON schema for their input parameters) in order to output tool invocations. This approach is taken by agents for Amazon Bedrock and OpenAI function calls. Note that LLMs generally need to be sufficiently large and complex in order to show tool selection ability.
Assuming task planning and tool selection mechanisms are chosen, a typical LLM agent program works in the following sequence:
123456
?” from some client application.OrdersAPI
.” The LLM is prompted to suggest a tool name such as OrdersAPI
from a predefined list of available tools and their descriptions. Alternatively, the LLM could be instructed to directly generate an API call with input parameters such as OrdersAPI(12345)
. Different agent frameworks execute the previous program flow differently. For example, ReAct combines tool selection and final answer generation into a single prompt, as opposed to using separate prompts for tool selection and answer generation. Also, this logic can be run in a single pass or run in a while statement (the “agent loop”), which terminates when the final answer is generated, an exception is thrown, or timeout occurs. What remains constant is that agents use the LLM as the centerpiece to orchestrate planning and tool invocations until the task terminates. Next, we show how to implement a simple agent loop using AWS services.
For this blog post, we implement an e-commerce support LLM agent that provides two functionalities powered by tools:
rtn001
?”123456
?”The agent effectively uses the LLM as a query router. Given a query (“What is the status of order 123456
?”), select the appropriate retrieval tool to query across multiple data sources (that is, returns and orders). We accomplish query routing by having the LLM pick among multiple retrieval tools, which are responsible for interacting with a data source and fetching context. This extends the simple RAG pattern, which assumes a single data source.
Both retrieval tools are Lambda functions that take an id (orderId
or returnId
) as input, fetches a JSON object from the data source, and converts the JSON into a human friendly representation string that’s suitable to be used by LLM. The data source in a real-world scenario could be a highly scalable NoSQL database such as DynamoDB, but this solution employs simple Python Dict
with sample data for demo purposes.
Additional functionalities can be added to the agent by adding Retrieval Tools and modifying prompts accordingly. This agent can be tested a standalone service that integrates with any UI over HTTP, which can be done easily with Amazon Lex.
Here are some additional details about the key components:
Flan-UL2
model. SageMaker JumpStart makes it easy to deploy LLM inference endpoints to dedicated SageMaker instances.Flan-UL2
model as-is without fine-tuning.orderId
or returnId
can be parsed. Otherwise, we respond with a default message.Now, let’s dive a bit deeper into the key components: agent orchestrator, task planner, and tool dispatcher.
Below is an abbreviated version of the agent loop inside the agent orchestrator Lambda function. The loop uses helper functions such as task_planner
or tool_parser
, to modularize the tasks. The loop here is designed to run at most two times to prevent the LLM from being stuck in a loop unnecessarily long.
The agent orchestrator uses task planner
to predict a retrieval tool based on user input. For our LLM agent, we will simply use prompt engineering and few shot prompting to teach the LLM this task in context. More sophisticated agents could use a fine-tuned LLM for tool prediction, which is beyond the scope of this post. The prompt is as follows:
The tool dispatch mechanism works via if/else
logic to call appropriate Lambda functions depending on the tool’s name. The following is tool_dispatch
helper function’s implementation. It’s used inside the agent
loop and returns the raw response from the tool Lambda function, which is then cleaned by an output_parser
function.
Important prerequisites – To get started with the deployment, you need to fulfill the following prerequisites:
Flan-UL2
requires a single ml.g5.12xlarge
for deployment, which may necessitate increasing resource limits via a support ticket. In our example, we use us-east-1
as the Region, so please make sure to increase the service quota (if needed) in us-east-1
.Deploy using CloudFormation – You can deploy the solution to us-east-1
by clicking the button below:
Deploying the solution will take about 20 minutes and will create a LLMAgentStack
stack, which:
Flan-UL2
model from SageMaker JumpStart;LLMAgentOrchestrator
, LLMAgentReturnsTool
, LLMAgentOrdersTool
; andSagemaker-Jumpstart-Flan-LLM-Agent-Fallback-Bot
.The stack deploys an Amazon Lex bot with the name Sagemaker-Jumpstart-Flan-LLM-Agent-Fallback-Bot
. The bot can be used to test the agent end-to-end. Here’s an additional comprehensive guide for testing AWS Amazon Lex bots with a Lambda integration and how the integration works at a high level. But in short, Amazon Lex bot is a resource that provides a quick UI to chat with the LLM agent running inside a Lambda function that we built (LLMAgentOrchestrator
).
The sample test cases to consider are as follows:
123456
?”) rtn003
processed?”) 383833
?”) rtn123
processed?”) rtn123
does not exist in the returns dataset, and hence should fail gracefully.rtn001
on world peace?”) To run these tests yourself, here are the instructions.
Sagemaker-Jumpstart-Flan-LLM-Agent-Fallback-Bot
. This bot has already been configured to call the LLMAgentOrchestrator
Lambda function whenever the FallbackIntent
is triggered.To avoid additional charges, delete the resources created by our solution by following these steps:
LLMAgentStack
(or the custom name you picked).Important: double-check that the stack is successfully deleted by ensuring that the Flan-UL2
inference endpoint is removed.
sm-jumpstart-flan-bot-endpoint
does not exist like the below screenshot.Deploying LLM agents to production requires taking extra steps to ensure reliability, performance, and maintainability. Here are some considerations prior to deploying agents in production:
Flan-UL2
model without fine-tuning to perform task planning or tool selection. In practice, using an LLM that is fine-tuned to directly output tool or API requests can increase reliability and performance, as well as simplify development. We could fine-tune an LLM on tool selection tasks or use a model that directly decodes tool tokens like Toolformer. In this post, we explored how to build an LLM agent that can utilize multiple tools from the ground up, using low-level prompt engineering, AWS Lambda functions, and SageMaker JumpStart as building blocks. We discussed the architecture of LLM agents and the agent loop in detail. The concepts and solution architecture introduced in this blog post may be appropriate for agents that use a small number of a predefined set of tools. We also discussed several strategies for using agents in production. Agents for Bedrock, which is in preview, also provides a managed experience for building agents with native support for agentic tool invocations.
Here's v2 of a project I started a few days ago. This will probably be…
We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance…
GUEST: Quantum computing (QC) brings with it a mix of groundbreaking possibilities and significant risks.…
The social network started experiencing global outages within minutes of Donald Trump posting details of…
What models or workflows are people using to generate these? submitted by /u/danikcara [link] [comments]
Ever felt like trying to find a needle in a haystack? That’s part of the…