Categories: FAANG

How Model Armor can help protect your AI apps from prompt injections and jailbreaks

As AI continues to rapidly develop, it’s crucial that IT teams address the business and organizational risks posed by two common threats: prompt injection and jailbreaking. 

Earlier this year we introduced Model Armor, a model-agnostic advanced screening solution that can help safeguard gen AI prompts and responses, and agent interactions. Model Armor offers a comprehensive suite of integration options, including direct API integration for developers, and inline integrations with Apigee, Vertex AI, Agentspace, and network service extensions. 

Many organizations already rely on Apigee as an API gateway, using capabilities such as Spike Arrest, Quota, and OAuth 2.0 for traffic and security management. By integrating with Model Armor, Apigee can become a critical security layer for generative AI interactions. 

This powerful combination allows for proactive screening of prompts and responses, ensuring AI applications are secure, compliant, and operate within defined guardrails. Today, we’re explaining how to get started using Model Armor with Apigee to secure your AI apps.

How to use Model Armor for AI app protection

Model Armor has five main capabilities.

  1. Prompt injection and jailbreak detection: It identifies and blocks attempts to manipulate an LLM into ignoring its instructions and safety filters. 

  2. Sensitive data protection: It can detect, classify, and prevent the exposure of sensitive information, including personally identifiable information (PII) and confidential data in both user prompts and LLM responses.

  3. Malicious URL detection: It scans for malicious and phishing links in both the input and output to prevent users from being directed to harmful websites, and to stop the LLM from inadvertently generating dangerous links.

  4. Harmful content filtering: It has built-in filters to detect content that is sexually explicit, dangerous, and contains harassment or hate speech, ensuring that outputs align with responsible AI principles.

  5. Document screening: It can also screen text in documents, including PDFs and Microsoft Office files, for malicious and sensitive content.

Model Armor integration with Apigee and LLMs.

Model Armor is designed to be model-independent and cloud-agnostic, meaning it can help to secure any gen AI model via REST APIs, regardless of whether it’s running on Google Cloud, another cloud provider, or a different platform. It exposes a REST endpoint or inline integration with other Google AI and networking services to perform these functions.

How to get started

  1. In the Google Cloud console, enable the Model Armor API and click on “Create a template.”

  2. Enable prompt injection and jailbreak detection. You can also enable the other safety filters as shown above, and click “Create.”

  3. Create a service account (or update an existing service account that has been used to deploy Apigee proxies,) and enable permissions in Model Armor User (roles/modelarmor.user) and Model Armor Viewer (roles/modelarmor.viewer) on the service account.

  4. From the Apigee console, create a new Proxy and enable the Model Armor policies.

  5. If you already have a proxy for the LLM calls, add two Apigee policies in the flow: SanitizeUserPrompt and SanitizeModelResponse.

  6. In the policy details, update reference to the Model Armor template created earlier. For example, projects/some-test-project/locations/us-central-1/templates/safeguard_llms. Similarly, configure the <SanitizeModelResponse> policy.

  7. Provide the source of the user prompt in the request payload Eg: JSON path.

  8. Configure the LLM endpoint as the target backend of Apigee Proxy and deploy the proxy by using the Service  account configured above. Your proxy should now be working and interacting with the Model Armor and LLM endpoints.

  9. During proxy execution, when Apigee invokes the Model armor, it returns a response that includes the “filter execution state” and “match state”. Apigee populates several flow variables  with information from the Model Armor response like SanitizeUserPrompt.POLICY_NAME.piAndJailbreakFilterResult.executionState and SanitizeUserPrompt.POLICY_NAME.piAndJailbreakFilterResult.matchState

  10. You can use a <Condition> to check if this flow variable equals MATCH_FOUND and configure the <RaiseFault> policy within your proxy’s flow.

Steps to configure Model Armor and integrate with Apigee to protect AI applications.

Review the findings

You can view the Model Armor findings in the AI Protection dashboard on the Security Command Center. A graph presents the volume of prompts and responses analyzed by Model Armor, along with the count of identified issues. 

It also summarizes various detected issue types, including prompt injection, jailbreak detection, and sensitive data identification.

Prompt and response content analytics provided by AI Protection dashboard.

With your knowledge of Model Armor, you’re ready to adjust the floor settings. Floor settings define the minimum security and safety requirements for all Model Armor templates in a specific part of your Google Cloud resource hierarchy. You can set confidence levels for responsible AI safety categories (such as hate speech and harassment,) prompt injection and jailbreak detection, and sensitive data protection (including topicality.)

Model Armor floor setting defines confidence levels for filtering.

Model Armor logging captures administrative activities like creating or updating templates and sanitation operations on prompts and responses, which can be viewed in Cloud Logging. You can configure logging within Model Armor templates to include details such as the prompt, response, and evaluation results.

Learn more by getting hands-on

Explore the tutorial for integrating Apigee with Model Armor here, and try the guided lab on configuring Model Armor.

AI Generated Robotic Content

Recent Posts

A lot of major updates on Flux Real-Time pipeline

Hello! Just a week ago I have posted here announce of my real-time streaming pipeline…

8 hours ago

Old Oil and Gas Wells Could Find Second Life Producing Clean Energy

States across the US are looking to take major sources of pollution and use them…

9 hours ago

It appears that Microsoft uploaded an image model on HuggingFace and then deleted it.

https://x.com/HuggingPapers/status/2055176632491778363 https://huggingface.co/microsoft/Lens https://huggingface.co/microsoft/Lens-Turbo submitted by /u/Total-Resort-3120 [link] [comments]

1 day ago

Restrict access to sensitive documents in your Amazon Quick knowledge bases for Amazon S3

Organizations that must restrict access to sensitive documents increasingly rely on AI-driven search and chat…

1 day ago

Gemini Live Agent Challenge: Announcing the winners and highlights

The Gemini Live Agent Challenge is officially in the books! We challenged developers worldwide to…

1 day ago

The Best Outdoor Deals From the REI Anniversary Sale 2026

It’s the best time of year to pick up all the outdoor gadgets, tents, sleeping…

1 day ago