image ML195901

How AutoScout24 built a Bot Factory to standardize AI agent development with Amazon Bedrock

AutoScout24 is Europe’s leading automotive marketplace platform that connects buyers and sellers of new and used cars, motorcycles, and commercial vehicles across several European countries. Their long-term vision is to build a Bot Factory, a centralized framework for creating and deploying artificial intelligence (AI) agents that can perform tasks and make decisions within workflows, to …

Companion image panw blogmax 1000x1000 1

Palo Alto Networks automates customer intelligence document creation with agentic design

For a global cybersecurity leader like Palo Alto Networks, a comprehensive understanding of each customer is critical for success. For every engagement the Palo Alto Networks pre-sales team has, the comprehensive understanding is centralized in an internal Document of Record (DOR), a vital asset that provides a 360-degree standardized view of the customer for sales …

badgephotoscorp amazon 100x133 1

Securing Amazon Bedrock cross-Region inference: Geographic and global

The adoption and implementation of generative AI inference has increased with organizations building more operational workloads that use AI capabilities in production at scale. To help customers achieve the scale of their generative AI applications, Amazon Bedrock offers cross-Region inference (CRIS) profiles, a powerful feature organizations can use to seamlessly distribute inference processing across multiple …

A gRPC transport for the Model Context Protocol

AI agents are moving from test environments to the core of enterprise operations, where they must interact reliably with external tools and systems to execute complex, multi-step goals. The Model Context Protocol (MCP) is the standard that makes this agent to tool communication possible. In fact, just last month we announced the release of fully-managed, …

Over-Searching in Search-Augmented Large Language Models

Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they often over-search – unnecessarily invoking search tool even when it does not improve response quality, which leads to computational inefficiency and hallucinations by incorporating irrelevant context. In this work, we conduct a systematic evaluation of over-searching across multiple dimensions, including …

image001

How Omada Health scaled patient care by fine-tuning Llama models on Amazon SageMaker AI

This post is co-written with Sunaina Kavi, AI/ML Product Manager at Omada Health. Omada Health, a longtime innovator in virtual healthcare delivery, launched a new nutrition experience in 2025, featuring OmadaSpark, an AI agent trained with robust clinical input that delivers real-time motivational interviewing and nutrition education. It was built on AWS. OmadaSpark was designed …

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. …

AdaBoN: Adaptive Best-of-N Alignment

Recent advances in test-time alignment methods, such as Best-of-N sampling, offer a simple and effective way to steer language models (LMs) toward preferred behaviors using reward models (RM). However, these approaches can be computationally expensive, especially when applied uniformly across prompts without accounting for differences in alignment difficulty. In this work, we propose a prompt-adaptive …

ML 20065 image 1

Crossmodal search with Amazon Nova Multimodal Embeddings

Amazon Nova Multimodal Embeddings processes text, documents, images, video, and audio through a single model architecture. Available through Amazon Bedrock, the model converts different input modalities into numerical embeddings within the same vector space, supporting direct similarity calculations regardless of content type. We developed this unified model to reduce the need for separate embedding models, …