Evaluate gen AI models with Vertex AI eval.max 1000x1000 1

Evaluate gen AI models with Vertex AI evaluation service and LLM comparator

It’s a persistent question: How do you know which generative AI model is the best choice for your needs? It all comes down to smart evaluation. In this post, we’ll share how to perform pairwise model evaluations – a way of comparing two models directly against each other – using Vertex AI evaluation service and …

image001 1

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

In our previous blog posts, we explored various techniques such as fine-tuning large language models (LLMs), prompt engineering, and Retrieval Augmented Generation (RAG) using Amazon Bedrock to generate impressions from the findings section in radiology reports using generative AI. Part 1 focused on model fine-tuning. Part 2 introduced RAG, which combines LLMs with external knowledge …

Demonstrating the AI-driven telecom at Mobile World Congress

Telecoms, like all businesses, are wondering how AI can transform their businesses. And there’s no better way to display how to build the AI-driven telecom than with demos. Join us at Mobile World Congress 2025, March 3-6 in Barcelona Hall 2 Booth #2H40, where we’ll be highlighting key agent use cases where AI is becoming …

Picture1 new

How Pattern PXM’s Content Brief is driving conversion on ecommerce marketplaces using AI

Brands today are juggling a million things, and keeping product content up-to-date is at the top of the list. Between decoding the endless requirements of different marketplaces, wrangling inventory across channels, adjusting product listings to catch a customer’s eye, and trying to outpace shifting trends and fierce competition, it’s a lot. And let’s face it—staying …

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models’ compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from …

12AEcNQT26foi2TaaYQen3OSg

Empowering the Warfighter: Palantir’s Partnership with Microsoft

Promoting Army readiness through seamless coordination between Palantir-powered Army Vantage platform and Microsoft Power BI Better Together As the Department of Defense (DoD) increasingly relies on software and data to drive mission readiness and operations, the need for cutting-edge, interoperable technology solutions has never been more critical. Data interoperability should be the cornerstone for informed decision-making …

ML 17004 neighbors 1

How IDIADA optimized its intelligent chatbot with Amazon Bedrock

This post is co-written with Xavier Vizcaino, Diego Martín Montoro, and Jordi Sánchez Ferrer from Applus+ Idiada. In 2021, Applus+ IDIADA, a global partner to the automotive industry with over 30 years of experience supporting customers in product development activities through design, engineering, testing, and homologation services, established the Digital Solutions department. This strategic move …

Enhancing AlloyDB vector search with inline filtering and enterprise observability

Many specialized vector databases today require you to create complex pipelines and applications in order to get the data you need. AlloyDB for PostgreSQL offers Google Research’s, state-of-the-art vector search index, ScaNN, enabling you to optimize the end-to-end retrieval of the most fresh, relevant data with a single SQL statement. Today, we are introducing a …

ML 18221 image001

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Today, we’re excited to announce that Mistral-Small-24B-Instruct-2501—a twenty-four billion parameter large language model (LLM) from Mistral AI that’s optimized for low latency text generation tasks—is available for customers through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace. Amazon Bedrock Marketplace is a new capability in Amazon Bedrock that developers can use to discover, test, and use over 100 …