ML 16302 ManagedRag
“How much will it cost to run our chatbot on Amazon Bedrock?” This is one of the most frequent questions we hear from customers exploring AI solutions. And it’s no wonder — calculating costs for AI applications can feel like navigating a complex maze of tokens, embeddings, and various pricing models. Whether you’re a solution architect, technical leader, or business decision-maker, understanding these costs is crucial for project planning and budgeting. In this post, we’ll look at Amazon Bedrock pricing through the lens of a practical, real-world example: building a customer service chatbot. We’ll break down the essential cost components, walk through capacity planning for a mid-sized call center implementation, and provide detailed pricing calculations across different foundation models. By the end of this post, you’ll have a clear framework for estimating your own Amazon Bedrock implementation costs and understanding the key factors that influence them.
For those that aren’t familiar, Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Amazon Bedrock provides a comprehensive toolkit for powering AI applications, including pre-trained large language models (LLMs), Retrieval Augmented Generation (RAG) capabilities, and seamless integration with existing knowledge bases. This powerful combination enables the creation of chatbots that can understand and respond to customer queries with high accuracy and contextual relevance.
For this example, our Amazon Bedrock chatbot will use a curated set of data sources and use Retrieval-Augmented Generation (RAG) to retrieve relevant information in real time. With RAG, our output from the chatbot will be enriched with contextual information from our data sources, giving our users a better customer experience. When understanding Amazon Bedrock pricing, it’s crucial to familiarize yourself with several key terms that significantly influence the expected cost. These components not only form the foundation of how your chatbot functions but also directly impact your pricing calculations. Let’s explore these key components. Key Components
The figure below demonstrates the architecture of a fully managed RAG solution on AWS.
One of the most challenging aspects of implementing an AI solution is accurately predicting your capacity needs. Without proper capacity estimation, you might either over-provision (leading to unnecessary costs) or under-provision (resulting in performance issues). Let’s walk through how to approach this crucial planning step for a real-world scenario. Before we dive into the numbers, let’s understand the key factors that affect your capacity and costs:
To make this concrete, let’s examine a typical call center implementation. Imagine you’re planning to deploy a customer service chatbot for a mid-sized organization handling product inquiries and support requests. Here’s how we’d break down the capacity planning: First, consider your knowledge base. In our scenario, we’re working with 10,000 support documents, each averaging 500 tokens in length. These documents need to be chunked into smaller pieces for effective retrieval, with each document typically splitting into 5 chunks. This gives us a total of 5 million tokens for our knowledge base. For the embedding process, those 10,000 documents will generate approximately 50,000 embeddings when we account for chunking and overlapping content. This is important because embeddings affect both your initial setup costs and ongoing storage needs.
Now, let’s look at the operational requirements. Based on typical call center volumes, we’re planning for:
When we aggregate these numbers, our monthly capacity requirements shape up to:
Understanding these numbers is crucial because they directly impact your costs in several ways:
This gives us a solid foundation for our cost calculations, which we’ll explore in detail in the next section.
Amazon Bedrock offers flexible pricing modes. With Amazon Bedrock, you are charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment.
To calculate the TCO for this scenario as one-time cost we’ll consider the foundation model, the volume of data in the knowledge base, the estimated number of queries and responses, and the concurrency level mentioned above. For this scenario we’ll be using an on-demand pricing model and showing how the pricing would be for some of the foundation models available on Amazon Bedrock.
The cost of this setup will be the sum of cost of LLM inferences and cost of vector store. To estimate cost of inferences, you can obtain the number of input tokens, context size and output tokens in the response metadata returned by the LLM. Total Cost Incurred = ((Input Tokens + Context Size) * Price per 1000 Input Tokens + Output tokens * Price per 1000 Output Tokens) + Embeddings. For input tokens we will be adding an additional context size of about 150 tokens for User Queries. Therefore as per our assumption of 10,000 User Queries, the total Context Size will be 1,500,000 tokens.
The following is a comparison of estimated monthly costs for various models on Amazon Bedrock based on our example use case using the on-demand pricing formula:
Embeddings Cost:
For text embeddings on Amazon Bedrock, we can choose from Amazon Titan Embeddings V2 model or Cohere Embeddings Model. In this example we are calculating a one-time cost for the embeddings.
The usual cost of vector stores has 2 components: size of vector data + number of requests to the store. You can choose whether to let the Amazon Bedrock console set up a vector store in Amazon OpenSearch Serverless for you or to use one that you have created in a supported service and configured with the appropriate fields. If you’re using OpenSearch Serverless as part of your setup, you’ll need to consider its costs. Pricing details can be found here: OpenSearch Service Pricing .
Here using the On-Demand pricing formula, the overall cost is calculated using some foundation models (FMs) available on Amazon Bedrock and the Embeddings cost.
• Amazon Nova:
• Meta Llama:
Evaluate models not just on their natural language understanding (NLU) and generation (NLG) capabilities, but also on their price-per-token ratios for both input and output processing. Consider whether premium models with higher per-token costs deliver proportional value for your specific use case, or if more cost-effective alternatives like Amazon Nova Lite or Meta Llama models can meet your performance requirements at a fraction of the cost.
Understanding and estimating Amazon Bedrock costs doesn’t have to be overwhelming. As we’ve demonstrated through our customer service chatbot example, breaking down the pricing into its core components – token usage, embeddings, and model selection – makes it manageable and predictable.
Key takeaways for planning your Bedrock implementation costs:
By following this systematic approach to cost estimation, you can confidently plan your Amazon Bedrock implementation and choose the most cost-effective configuration for your specific use case. Remember that the cheapest option isn’t always the best – consider the balance between cost, performance, and your specific requirements when making your final decision.
With Amazon Bedrock, you have the flexibility to choose the most suitable model and pricing structure for your use case. We encourage you to explore the AWS Pricing Calculator for more detailed cost estimates based on your specific requirements.
To learn more about building and optimizing chatbots with Amazon Bedrock, check out the workshop Building with Amazon Bedrock.
We’d love to hear about your experiences building chatbots with Amazon Bedrock. Share your success stories or challenges in the comments!
TL;DR — I trained two LoRAs for Qwen-Image: Lenovo: my cross-model realism booster (I port…
One of the most widespread machine learning techniques is XGBoost (Extreme Gradient Boosting).
UI prototyping often involves iterating and blending elements from examples such as screenshots and sketches,…
By Leo Isikdogan, Jesse Korosi, Zile Liao, Nagendra Kamath, Ananya PoddarAt Netflix, we support the filmmaking…
Stragglers are an industry-wide issue for developers working with large-scale machine learning workloads. The larger…