MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models’ compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from …

12AEcNQT26foi2TaaYQen3OSg

Empowering the Warfighter: Palantir’s Partnership with Microsoft

Promoting Army readiness through seamless coordination between Palantir-powered Army Vantage platform and Microsoft Power BI Better Together As the Department of Defense (DoD) increasingly relies on software and data to drive mission readiness and operations, the need for cutting-edge, interoperable technology solutions has never been more critical. Data interoperability should be the cornerstone for informed decision-making …

ML 17004 neighbors 1

How IDIADA optimized its intelligent chatbot with Amazon Bedrock

This post is co-written with Xavier Vizcaino, Diego Martín Montoro, and Jordi Sánchez Ferrer from Applus+ Idiada. In 2021, Applus+ IDIADA, a global partner to the automotive industry with over 30 years of experience supporting customers in product development activities through design, engineering, testing, and homologation services, established the Digital Solutions department. This strategic move …

Enhancing AlloyDB vector search with inline filtering and enterprise observability

Many specialized vector databases today require you to create complex pipelines and applications in order to get the data you need. AlloyDB for PostgreSQL offers Google Research’s, state-of-the-art vector search index, ScaNN, enabling you to optimize the end-to-end retrieval of the most fresh, relevant data with a single SQL statement. Today, we are introducing a …

ML 18221 image001

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Today, we’re excited to announce that Mistral-Small-24B-Instruct-2501—a twenty-four billion parameter large language model (LLM) from Mistral AI that’s optimized for low latency text generation tasks—is available for customers through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace. Amazon Bedrock Marketplace is a new capability in Amazon Bedrock that developers can use to discover, test, and use over 100 …

Announcing Claude 3.7 Sonnet, Anthropic’s first hybrid reasoning model, is available on Vertex AI

Today, we’re announcing Claude 3.7 Sonnet, Anthropic’s most intelligent model to date and the first hybrid reasoning model on the market, is available in preview on Vertex AI Model Garden. Claude 3.7 Sonnet can produce quick responses or extended, step-by-step thinking that is made visible to the user. Claude 3.7 Sonnet includes improvements in coding, …

Everyone Has Given Up on AI Safety, Now What?

The End of the AI Safety Debate For years, a passionate contingent of researchers, ethicists, and policymakers warned about the potential dangers of unchecked artificial intelligence development. They argued about p(doom) probabilities, AI alignment strategies, and regulations that could prevent catastrophe. But as of now, that conversation has all but collapsed. The frontier AI companies—OpenAI, …

RKT LegacyFlow

How Rocket Companies modernized their data science solution on AWS

This post was written with Dian Xu and Joel Hawkins of Rocket Companies. Rocket Companies is a Detroit-based FinTech company with a mission to “Help Everyone Home”. With the current housing shortage and affordability concerns, Rocket simplifies the homeownership process through an intuitive and AI-driven experience. This comprehensive framework streamlines every step of the homeownership …