Benchmarking hallucinations: New metric tracks where multimodal reasoning models go wrong

Over the past decades, computer scientists have introduced increasingly sophisticated machine learning-based models, which can perform remarkably well on various tasks. These include multimodal large language models (MLLMs), systems that can process and generate different types of data, predominantly texts, images and videos.

ML 18505 image 1

Deploy Qwen models with Amazon Bedrock Custom Model Import

We’re excited to announce that Amazon Bedrock Custom Model Import now supports Qwen models. You can now import custom weights for Qwen2, Qwen2_VL, and Qwen2_5_VL architectures, including models like Qwen 2, 2.5 Coder, Qwen 2.5 VL, and QwQ 32B. You can bring your own customized Qwen models into Amazon Bedrock and deploy them in a fully managed, serverless environment—without having to …

BRK3 066

How good is your AI? Gen AI evaluation at every stage, explained

As AI moves from promising experiments to landing core business impact, the most critical question is no longer “What can it do?” but “How well does it do it?”.  Ensuring the quality, reliability, and safety of your AI applications is a strategic imperative. To guide you, evaluation must be your North Star—a constant process that …