ML 18505 image 1
We’re excited to announce that Amazon Bedrock Custom Model Import now supports Qwen models. You can now import custom weights for Qwen2, Qwen2_VL, and Qwen2_5_VL architectures, including models like Qwen 2, 2.5 Coder, Qwen 2.5 VL, and QwQ 32B. You can bring your own customized Qwen models into Amazon Bedrock and deploy them in a fully managed, serverless environment—without having to manage infrastructure or model serving.
In this post, we cover how to deploy Qwen 2.5 models with Amazon Bedrock Custom Model Import, making them accessible to organizations looking to use state-of-the-art AI capabilities within the AWS infrastructure at an effective cost.
Qwen 2 and 2.5 are families of large language models, available in a wide range of sizes and specialized variants to suit diverse needs:
Amazon Bedrock Custom Model Import enables the import and use of your customized models alongside existing foundation models (FMs) through a single serverless, unified API. You can access your imported custom models on-demand and without the need to manage the underlying infrastructure. Accelerate your generative AI application development by integrating your supported custom models with native Amazon Bedrock tools and features like Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Agents. Amazon Bedrock Custom Model Import is generally available in the US-East (N. Virginia), US-West (Oregon), and Europe (Frankfurt) AWS Regions. Now, we’ll explore how you can use Qwen 2.5 models for two common use cases: as a coding assistant and for image understanding. Qwen2.5-Coder is a state-of-the-art code model, matching capabilities of proprietary models like GPT-4o. It supports over 90 programming languages and excels at code generation, debugging, and reasoning. Qwen 2.5-VL brings advanced multimodal capabilities. According to Qwen, Qwen 2.5-VL is not only proficient at recognizing objects such as flowers and animals, but also at analyzing charts, extracting text from images, interpreting document layouts, and processing long videos.
Before importing the Qwen model with Amazon Bedrock Custom Model Import, make sure that you have the following in place:
In this example, we will demonstrate how to build a coding assistant using the Qwen2.5-Coder-7B-Instruct model
You will use Qwen/Qwen2.5-Coder-7B-Instruct
for the rest of the walkthrough. We don’t demonstrate fine-tuning steps, but you can also fine-tune before importing.
Depending on your model size, this could take a few minutes. When completed, your Qwen Coder 7B model folder will contain the following files.
config.json
, generation_config.json
, tokenizer_config.json
, tokenizer.json
, and vocab.json
safetensor
files and model.safetensors.index.json
LICENSE
, README.md
, and merges.txt
boto3
or the command line:aws s3 cp ./extractedfolder s3://yourbucket/path/ --recursive
You can also do this using the AWS Management Console for Amazon Bedrock.
If you’re using your own role, make sure you add the following trust relationship as describes in Create a service role for model import.
After your model is imported, wait for model inference to be ready, and then chat with the model on the playground or through the API. In the following example, we append Python
to prompt the model to directly output Python code to list items in an S3 bucket. Remember to use the right chat template to input prompts in the format required. For example, you can get the right chat template for any compatible model on Hugging Face using below code:
Note that when using the invoke_model
APIs, you must use the full Amazon Resource Name (ARN) for the imported model. You can find the Model ARN in the Bedrock console, by navigating to the Imported models section and then viewing the Model details page, as shown in the following figure
After the model is ready for inference, you can use Chat Playground in Bedrock console or APIs to invoke the model.
Qwen2.5-VL-* offers multimodal capabilities, combining vision and language understanding in a single model. This section demonstrates how to deploy Qwen2.5-VL using Amazon Bedrock Custom Model Import and test its image understanding capabilities.
Download the model from Huggingface Face and upload it to Amazon S3:
Next, import the model to Amazon Bedrock (either via Console or API):
After the import is complete, test the model with an image input. The Qwen2.5-VL-* model requires proper formatting of multimodal inputs:
When provided with an example image of a cat (such the following image), the model accurately describes key features such as the cat’s position, fur color, eye color, and general appearance. This demonstrates Qwen2.5-VL-* model’s ability to process visual information and generate relevant text descriptions.
The model’s response:
You can use Amazon Bedrock Custom Model Import to use your custom model weights within Amazon Bedrock for supported architectures, serving them alongside Amazon Bedrock hosted FMs in a fully managed way through On-Demand mode. Custom Model Import doesn’t charge for model import. You are charged for inference based on two factors: the number of active model copies and their duration of activity. Billing occurs in 5-minute increments, starting from the first successful invocation of each model copy. The pricing per model copy per minute varies based on factors including architecture, context length, Region, and compute unit version, and is tiered by model copy size. The custom model unites required for hosting depends on the model’s architecture, parameter count, and context length. Amazon Bedrock automatically manages scaling based on your usage patterns. If there are no invocations for 5 minutes, it scales to zero and scales up when needed, though this might involve cold-start latency of up to a minute. Additional copies are added if inference volume consistently exceeds single-copy concurrency limits. The maximum throughput and concurrency per copy is determined during import, based on factors such as input/output token mix, hardware type, model size, architecture, and inference optimizations.
For more information, see Amazon Bedrock pricing.
To avoid ongoing charges after completing the experiments:
Remember that while Amazon Bedrock Custom Model Import doesn’t charge for the import process itself, you are billed for model inference usage and storage.
Amazon Bedrock Custom Model Import empowers organizations to use powerful publicly available models like Qwen 2.5, among others, while benefiting from enterprise-grade infrastructure. The serverless nature of Amazon Bedrock eliminates the complexity of managing model deployments and operations, allowing teams to focus on building applications rather than infrastructure. With features like auto scaling, pay-per-use pricing, and seamless integration with AWS services, Amazon Bedrock provides a production-ready environment for AI workloads. The combination of Qwen 2.5’s advanced AI capabilities and Amazon Bedrock managed infrastructure offers an optimal balance of performance, cost, and operational efficiency. Organizations can start with smaller models and scale up as needed, while maintaining full control over their model deployments and benefiting from AWS security and compliance capabilities.
For more information, refer to the Amazon Bedrock User Guide.
If you've been using large language models like GPT-4 or Claude, you've probably wondered how…
This post is divided into three parts; they are: • Interpolation and Extrapolation in Sinusoidal…
Machine learning workflows often involve a delicate balance: you want models that perform exceptionally well,…
Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro…
This post is co-written with Vicky Andonova and Jonathan Karon from Anomalo. Generative AI has…