SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in textual domain, there is still no benchmark that systematically evaluates schema-grounded information extraction and reasoning over visual inputs. In this work, we …

1 Agent controlled model sequence diagramax 1000x1000 1

Using MCP with Web3: How to secure agents making blockchain transactions

At Google Cloud, we sit at a unique intersection of two transformative technologies: AI and Web3. The rise of AI agents capable of interacting with blockchains opens up a world of automated financial strategies, fast payments, and more complex scenarios like executing complex DeFi operations and bridging assets across multiple chains.  However, the practical viability …

Building Trust at Scale

The Next Generation of Audit Logging at Palantir Every day, organizations entrust Palantir platforms with their most sensitive data and critical operations. From government agencies coordinating national security missions to healthcare providers safeguarding patient information to financial institutions detecting fraud, our customers depend on us to help them make decisions that matter. This trust isn’t given …

AV1 — Now Powering 30% of Netflix Streaming

AV1 — Now Powering 30% of Netflix Streaming Liwei Guo, Zhi Li, Sheldon Radford, Jeff Watts Streaming video has become an integral part of our daily lives. At Netflix, our top priority is delivering the best possible entertainment experience to our members, regardless of their devices or network conditions. One of the key technologies enabling this is AV1, …

image1 uEwzVComax 1000x1000 1

Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

As large language models (LLMs) continue to grow in size and complexity, the time it takes to load them from storage to accelerator memory for inference can become a significant bottleneck. This “cold start” problem isn’t just a minor delay — it’s a critical barrier to building resilient, scalable, and cost-effective AI services. Every minute …

Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language

Automated interpretability aims to translate large language model (LLM) features into human understandable descriptions. However, these natural language feature descriptions are often vague, inconsistent, and require manual relabeling. In response, we introduce semantic regexes, structured language descriptions of LLM features. By combining primitives that capture linguistic and semantic feature patterns with modifiers for contextualization, composition, …

Hybrid Modeling of Photoplethysmography for Non-Invasive Monitoring of Cardiovascular Parameters

Continuous cardiovascular monitoring can play a key role in precision health. However, some fundamental cardiac biomarkers of interest, including stroke volume and cardiac output, require invasive measurements, e.g., arterial pressure waveforms (APW). As a non-invasive alternative, photoplethysmography (PPG) measurements are routinely collected in hospital settings. Unfortunately, the prediction of key cardiac biomarkers from PPG instead …

2 Amie Lightning Talkmax 1000x1000 1

GKE Turns 10 Hackathon: Announcing the winners and highlights

The GKE Turns 10 Hackathon was an electrifying showcase of developer ingenuity! Building on the excitement from our initial announcement, the hackathon challenged participants to build powerful AI agents that interact with microservice applications using the robustness of Google Kubernetes Engine (GKE) and the intelligence of Google AI models like Gemini.  The goal was to …