Categories: FAANG

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

ML 18221 image001

Today, we’re excited to announce that Mistral-Small-24B-Instruct-2501—a twenty-four billion parameter large language model (LLM) from Mistral AI that’s optimized for low latency text generation tasks—is available for customers through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace. Amazon Bedrock Marketplace is a new capability in Amazon Bedrock that developers can use to discover, test, and use over 100 popular, emerging, and specialized foundation models (FMs) alongside the current selection of industry-leading models in Amazon Bedrock. These models are in addition to the industry-leading models that are already available on Amazon Bedrock. You can also use this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference. In this post, we walk through how to discover, deploy, and use Mistral-Small-24B-Instruct-2501.

Overview of Mistral Small 3 (2501)

Mistral Small 3 (2501), a latency-optimized 24B-parameter model released under Apache 2.0 maintains a balance between performance and computational efficiency. Mistral offers both the pretrained (Mistral-Small-24B-Base-2501) and instruction-tuned (Mistral-Small-24B-Instruct-2501) checkpoints of the model under Apache 2.0. Mistral Small 3 (2501) features a 32 k token context window. According to Mistral, the model demonstrates strong performance in code, math, general knowledge, and instruction following compared to its peers. Mistral Small 3 (2501) is designed for the 80% of generative AI tasks that require robust language and instruction following performance with very low latency. The instruction-tuning process is focused on improving the model’s ability to follow complex directions, maintain coherent conversations, and generate accurate, context-aware responses. The 2501 version follows previous iterations (Mistral-Small-2409 and Mistral-Small-2402) released in 2024, incorporating improvements in instruction-following and reliability. Currently, the instruct version of this model, Mistral-Small-24B-Instruct-2501 is available for customers to deploy and use on SageMaker JumpStart and Bedrock Marketplace.

Optimized for conversational assistance

Mistral Small 3 (2501) excels in scenarios where quick, accurate responses are critical, such as in virtual assistants. This includes virtual assistants where users expect immediate feedback and near real-time interactions. Mistral Small 3 (2501) can handle rapid function execution when used as part of automated or agentic workflows. The architecture is designed to typically respond in less than 100 milliseconds, according to Mistral, making it ideal for customer service automation, interactive assistance, live chat, and content moderation.

Performance metrics and benchmarks

According to Mistral, the instruction-tuned version of the model achieves over 81% accuracy on Massive Multitask Language Understanding (MMLU) with 150 tokens per second latency, making it currently the most efficient model in its category. In third-party evaluations conducted by Mistral, the model demonstrates competitive performance against larger models such as Llama 3.3 70B and Qwen 32B. Notably, Mistral claims that the model performs at the same level as Llama 3.3 70B instruct and is more than three times faster on the same hardware.

SageMaker JumpStart overview

SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as Mistral, for a variety of tasks.

You can now discover and deploy Mistral models in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in a secure AWS environment and under your VPC controls, helping to support data security for enterprise security needs.

Prerequisites

To try Mistral-Small-24B-Instruct-2501 in SageMaker JumpStart, you need the following prerequisites:

An AWS account that will contain all your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, see Identity and Access Management for Amazon SageMaker.
Access to Amazon SageMaker Studio, a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
Access to accelerated instances (GPUs) for hosting the model.

Amazon Bedrock Marketplace overview

To get started, in the AWS Management Console for Amazon Bedrock, select Model catalog in the Foundation models section of the navigation pane. Here, you can search for models that help you with a specific use case or language. The results of the search include both serverless models and models available in Amazon Bedrock Marketplace. You can filter results by provider, modality (such as text, image, or audio), or task (such as classification or text summarization).

Deploy Mistral-Small-24B-Instruct-2501 in Amazon Bedrock Marketplace

To access Mistral-Small-24B-Instruct-2501 in Amazon Bedrock, complete the following steps:

On the Amazon Bedrock console, select Model catalog under Foundation models in the navigation pane.

At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesn’t support Converse APIs or other Amazon Bedrock tooling.

Filter for Mistral as a provider and select the Mistral-Small-24B-Instruct-2501

The model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration.

The page also includes deployment options and licensing information to help you get started with Mistral-Small-24B-Instruct-2501 in your applications.

To begin using Mistral-Small-24B-Instruct-2501, choose Deploy.
You will be prompted to configure the deployment details for Mistral-Small-24B-Instruct-2501. The model ID will be pre-populated.
1. For Endpoint name, enter an endpoint name (up to 50 alphanumeric characters).
2. For Number of instances, enter a number between 1and 100.
3. For Instance type, select your instance type. For optimal performance with Mistral-Small-24B-Instruct-2501, a GPU-based instance type such as ml.g6.12xlarge is recommended.
4. Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.
Choose Deploy to begin using the model.

When the deployment is complete, you can test Mistral-Small-24B-Instruct-2501 capabilities directly in the Amazon Bedrock playground.

Choose Open in playground to access an interactive interface where you can experiment with different prompts and adjust model parameters such as temperature and maximum length.

When using Mistral-Small-24B-Instruct-2501 with the Amazon Bedrock InvokeModel and Playground console, use DeepSeek’s chat template for optimal results. For example, <｜begin▁of▁sentence｜><｜User｜>content for inference<｜Assistant｜>.

This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results.

You can quickly test the model in the playground through the UI. However, to invoke the deployed model programmatically with Amazon Bedrock APIs, you need to get the endpoint Amazon Resource Name (ARN).

Discover Mistral-Small-24B-Instruct-2501 in SageMaker JumpStart

You can access Mistral-Small-24B-Instruct-2501 through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

In the SageMaker Studio console, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
Select HuggingFace.
From the SageMaker JumpStart landing page, search for Mistral-Small-24B-Instruct-2501 using the search box.
Select a model card to view details about the model such as license, data used to train, and how to use the model. Choose Deploy to deploy the model and create an endpoint.

Deploy Mistral-Small-24B-Instruct-2501 with the SageMaker SDK

Deployment starts when you choose Deploy. After deployment finishes, you will see that an endpoint is created. Test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

To deploy using the SDK, start by selecting the Mistral-Small-24B-Instruct-2501 model, specified by the model_id with the value mistral-small-24B-instruct-2501. You can deploy your choice of the selected models on SageMaker using the following code. Similarly, you can deploy Mistral-Small-24b-Instruct-2501 using its model ID.
```
from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 

model = JumpStartModel(model_id="huggingface-llm-mistral-small-24b-instruct-2501") 
predictor = model.deploy(accept_eula=accept_eula)
```

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The EULA value must be explicitly defined as True to accept the end-user license agreement (EULA). See AWS service quotas for how to request a service quota increase.

After the model is deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

prompt = "Hello!"
payload = {
    "messages": [
        {
            "role": "user",
            "content": prompt
        }
    ],
    "max_tokens": 4000,
    "temperature": 0.1,
    "top_p": 0.9,
}
    
response = predictor.predict(payload)
print(response['choices'][0]['message']['content'])

Retail math example

Here’s an example of how Mistral-Small-24B-Instruct-2501 can break down a common shopping scenario. In this case, you ask the model to calculate the final price of a shirt after applying multiple discounts—a situation many of us face while shopping. Notice how the model provides a clear, step-by-step solution to follow.

prompt = "A store is having a 20% off sale, and you have an additional 10% off coupon. If you buy a shirt that originally costs $50, how much will you pay?"
payload = {
    "messages": [
        {
            "role": "user",
            "content": prompt
        }
    ],
    "max_tokens": 1000,
    "temperature": 0.1,
    "top_p": 0.9,
}
    
response = predictor.predict(payload)
print(response['choices'][0]['message']['content'])

The following is the output:

First, we'll apply the 20% off sale discount to the original price of the shirt.

20% of $50 is calculated as:
0.20 * $50 = $10

So, the price after the 20% discount is:
$50 - $10 = $40

Next, we'll apply the additional 10% off coupon to the new price of $40.

10% of $40 is calculated as:
0.10 * $40 = $4

So, the price after the additional 10% discount is:
$40 - $4 = $36

Therefore, you will pay $36 for the shirt.

The response shows clear step-by-step reasoning without introducing incorrect information or hallucinated facts. Each mathematical step is explicitly shown, making it simple to verify the accuracy of the calculations.

Clean up

To avoid unwanted charges, complete the following steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

On the Amazon Bedrock console, under Foundation models in the navigation pane, select Marketplace deployments.
In the Managed deployments section, locate the endpoint you want to delete.
Select the endpoint, and on the Actions menu, select Delete.
Verify the endpoint details to make sure you’re deleting the correct deployment:
1. Endpoint name
2. Model name
3. Endpoint status
Choose Delete to delete the endpoint.
In the deletion confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart predictor

After you’re done running the notebook, make sure to delete all resources that you created in the process to avoid additional billing. For more details, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral-Small-24B-Instruct-2501 in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repo.

About the Authors

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

Shane Rai is a Principal Generative AI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML services offered by AWS, including model offerings from top tier foundation model providers.

Avan Bala is a Solutions Architect at AWS. His area of focus is AI for DevOps and machine learning. He holds a bachelor’s degree in Computer Science with a minor in Mathematics and Statistics from the University of Maryland. Avan is currently working with the Enterprise Engaged East Team and likes to specialize in projects about emerging AI technologies.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, the machine learning and generative AI hub provided by SageMaker. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

AI Generated Robotic Content