Over-Searching in Search-Augmented Large Language Models

Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they often over-search – unnecessarily invoking search tool even when it does not improve response quality, which leads to computational inefficiency and hallucinations by incorporating irrelevant context. In this work, we conduct a systematic evaluation of over-searching across multiple dimensions, including …

image001

How Omada Health scaled patient care by fine-tuning Llama models on Amazon SageMaker AI

This post is co-written with Sunaina Kavi, AI/ML Product Manager at Omada Health. Omada Health, a longtime innovator in virtual healthcare delivery, launched a new nutrition experience in 2025, featuring OmadaSpark, an AI agent trained with robust clinical input that delivers real-time motivational interviewing and nutrition education. It was built on AWS. OmadaSpark was designed …

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. …

AdaBoN: Adaptive Best-of-N Alignment

Recent advances in test-time alignment methods, such as Best-of-N sampling, offer a simple and effective way to steer language models (LMs) toward preferred behaviors using reward models (RM). However, these approaches can be computationally expensive, especially when applied uniformly across prompts without accounting for differences in alignment difficulty. In this work, we propose a prompt-adaptive …

ML 20065 image 1

Crossmodal search with Amazon Nova Multimodal Embeddings

Amazon Nova Multimodal Embeddings processes text, documents, images, video, and audio through a single model architecture. Available through Amazon Bedrock, the model converts different input modalities into numerical embeddings within the same vector space, supporting direct similarity calculations regardless of content type. We developed this unified model to reduce the need for separate embedding models, …

ML 18088 image 1 1

Scaling medical content review at Flo Health using Amazon Bedrock (Part 1)

This blog post is based on work co-developed with Flo Health. Healthcare science is rapidly advancing. Maintaining accurate and up-to-date medical content directly impacts people’s lives, health decisions, and well-being. When someone searches for health information, they are often at their most vulnerable, making accuracy not just important, but potentially life-saving. Flo Health creates thousands …

Publikacja fałszywych twierdzeń nt. Palantira i Rządu Szwajcarskiego przez czasopismo Die Republik

Sprostowanie: Publikacja fałszywych twierdzeń nt. Palantira i Rządu Szwajcarskiego przez czasopismo ‘Die Republik’ Wprowadzenie Artykuł opublikowany w grudniu 2025 r. w Republik zwraca uwagę na raport Sztabu Sił Zbrojnych Szwajcarii (Armeestab) z 2024 r., w którym oceniono możliwość wdrożenia oprogramowania opracowanego przez firmę Palantir. Artykuł przedstawia sytuację w sposób fałszywy i wprowadzający w błąd, zgodnie …

1 bq mcp blogmax 1000x1000 1

Build data analytics agents faster with BigQuery’s fully managed, remote MCP server

Connecting AI agents to your enterprise data shouldn’t require complex custom integrations or weeks of development. With the release of fully managed, remote Model Context Protocol (MCP) servers for Google services last month, you can now use BigQuery MCP server to give your AI agents a direct, secure, way to analyze data. This fully managed …

Improving User Interface Generation Models from Designer Feedback

Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with designers’ workflows and ignore the rich rationale used to critique and improve …

CES 2016 physical 1

NVIDIA Rubin Platform, Open Models, Autonomous Driving: NVIDIA Presents Blueprint for the Future at CES

NVIDIA founder and CEO Jensen Huang took the stage at the Fontainebleau Las Vegas today to open CES 2026, declaring that AI is scaling into every domain and every device. “Computing has been fundamentally reshaped as a result of accelerated computing, as a result of artificial intelligence,” Huang said. “What that means is some $10 …