frgud

Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads

In 2025, Amazon SageMaker AI saw dramatic improvements to core infrastructure offerings along four dimensions: capacity, price performance, observability, and usability. In this series of posts, we discuss these various improvements and their benefits. In Part 1, we discuss capacity improvements with the launch of Flexible Training Plans. We also describe improvements to price performance …

ML 19776 image 1

Build AI workflows on Amazon EKS with Union.ai and Flyte

As artificial intelligence and machine learning (AI/ML) workflows grow in scale and complexity, it becomes harder for practitioners to organize and deploy their models. AI projects often struggle to move from pilot to production. AI projects often fail not because models are bad, but because infrastructure and processes are fragmented and brittle, and the original …

shaunAnimationBlurred

Using Google Cloud AI to measure the physics of U.S. freestyle snowboarding and skiing

Nearly every snowboard trick carries a number. A 1080 means three full rotations. A 1440 means four. The convention is simple: add up every rotation around every axis and count in 180° increments. For decades it’s served as the sport’s universal shorthand for difficulty. Judges, coaches, and athletes all speak this language fluently. It’s also, …

Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

Query Auto-Completion (QAC) is a critical feature of modern search systems that improves search efficiency by suggesting completions as users type. However, existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have poor long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that …

ML 20135 image 1

Build unified intelligence with Amazon Bedrock AgentCore

Building cohesive and unified customer intelligence across your organization starts with reducing the friction your sales representatives face when toggling between Salesforce, support tickets, and Amazon Redshift. A sales representative preparing for a customer meeting might spend hours clicking through several different dashboards—product recommendations, engagement metrics, revenue analytics, etc. – before developing a complete picture …

1 onemcplaunchblogdemo

Powering the next generation of agents with Google Cloud databases

For developers building AI applications, including custom agents and chatbots, the open-source Model Context Protocol (MCP) standard enables your innovations to access data and tools consistently and securely. At the end of 2025, we introduced managed and remote MCP support for services like Google Maps and BigQuery, establishing a standard method for AI to connect …

Models That Prove Their Own Correctness

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train Self-Proving models that prove the correctness of their output …

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployments typically use a tiered static-dynamic design: a static cache of curated, offline vetted responses mined from logs, backed by a dynamic cache populated online. In practice, both …

A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation

What research can be pursued with small models trained to complete true programs? Typically, researchers study program synthesis via large language models (LLMs) which introduce issues such as knowing what is in or out of distribution, understanding fine-tuning effects, understanding the effects of tokenization, and higher demand on compute and storage to carry out experiments. …

1at60qZXd0j6SphkjzdmazQ

Scaling LLM Post-Training at Netflix

Baolin Li, Lingyi Liu, Binh Tang, Shaojing Li Introduction Pre-training gives Large Language Models (LLMs) broad linguistic ability and general world knowledge, but post-training is the phase that actually aligns them to concrete intents, domain constraints, and the reliability requirements of production environments. At Netflix, we are exploring how LLMs can enable new member experiences across …