1NR2khsXvtRp9z6neKfuHdQ

Defensive Databases: Optimizing Index-Refresh Semantics

Editor’s Note: This is the first post in a series exploring how Palantir customizes infrastructure software for reliable operation at scale. Written by the Foundations organization — which owns the foundational technologies backing all our software, including our storage infrastructure — this post details our experience tuning and customizing ES without forking the source code. We have two primary …

ml19539 1

Running deep research AI agents on Amazon Bedrock AgentCore

AI agents are evolving beyond basic single-task helpers into more powerful systems that can plan, critique, and collaborate with other agents to solve complex problems. Deep Agents—a recently introduced framework built on LangGraph—bring these capabilities to life, enabling multi-agent workflows that mirror real-world team dynamics. The challenge, however, is not just building such agents but …

image3 99Is5bhmax 1000x1000 1

AI Innovators: How JAX on TPU is helping Escalante advance AI-driven protein design

As a Python library for accelerator-oriented array computation and program transformation, JAX is widely recognized for its power in training large-scale AI models. But its core design as a system for composable function transformations unlocks its potential in a much broader scientific landscape. Following our recent post on solving high-order partial differential equations, or PDEs, …

122EDwImi4b5D3tZuY8KXjQ

Scaling Muse: How Netflix Powers Data-Driven Creative Insights at Trillion-Row Scale

By Andrew Pierce, Chris Thrailkill, Victor Chiapaikeo At Netflix, we prioritize getting timely data and insights into the hands of the people who can act on them. One of our key internal applications for this purpose is Muse. Muse’s ultimate goal is to help Netflix members discover content they’ll love by ensuring our promotional media …

ML 19059 1

Rapid ML experimentation for enterprises with Amazon SageMaker AI and Comet

This post was written with Sarah Ostermeier from Comet. As enterprise organizations scale their machine learning (ML) initiatives from proof of concept to production, the complexity of managing experiments, tracking model lineage, and managing reproducibility grows exponentially. This is primarily because data scientists and ML engineers constantly explore different combinations of hyperparameters, model architectures, and …

1xiHZ0qDpJbANPSw1UwqLJA

Empowering Netflix Engineers with Incident Management

By: Molly Struve Netflix’s mission to provide seamless entertainment to hundreds of millions of users globally demands exceptional reliability. At the heart of this reliability is how we handle incidents — those inevitable moments when something doesn’t go as expected. Teams can respond quickly and more effectively when incidents are managed consistently across a company. A robust process …

image 1 3

Move your AI agents from proof of concept to production with Amazon Bedrock AgentCore

Building an AI agent that can handle a real-life use case in production is a complex undertaking. Although creating a proof of concept demonstrates the potential, moving to production requires addressing scalability, security, observability, and operational concerns that don’t surface in development environments. This post explores how Amazon Bedrock AgentCore helps you transition your agentic applications …

ml 19647 1

Scale visual production using Stability AI Image Services in Amazon Bedrock

This post was written with Alex Gnibus of Stability AI. Stability AI Image Services are now available in Amazon Bedrock, offering ready-to-use media editing capabilities delivered through the Amazon Bedrock API. These image editing tools expand on the capabilities of Stability AI’s Stable Diffusion 3.5 models (SD3.5) and Stable Image Core and Ultra models, which …