Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization

Efficient large-scale inference of transformer-based large language models (LLMs) remains a fundamental systems challenge, frequently requiring multi-GPU parallelism to meet stringent latency and throughput targets. Conventional tensor parallelism decomposes matrix operations across devices but introduces substantial inter-GPU synchronization, leading to communication bottlenecks and degraded scalability. We propose the Parallel Track (PT) Transformer, a novel architectural …

ml 19222 2 architecture 1

How Amazon uses Amazon Nova models to automate operational readiness testing for new fulfillment centers

Amazon is a global ecommerce and technology company that operates a vast network of fulfillment centers to store, process, and ship products to customers worldwide. The Amazon Global Engineering Services (GES) team is responsible for facilitating operational readiness across the company’s rapidly expanding network of fulfillment centers. When launching new fulfillment centers, Amazon must verify …

Gemini Enterprise Agent Ready (GEAR) program now available, a new path to building AI agents at scale

Today’s reality is agentic – software that can reason, plan, and act on your behalf to execute complex workflows. To meet this moment, we are excited to open the Gemini Enterprise Agent Ready (GEAR) learning program to everyone. As a new specialized pathway within the Google Developer Program, GEAR empowers developers and pros to build …

ML 20387 image 1

Automated Reasoning checks rewriting chatbot reference implementation

Today, we are publishing a new open source sample chatbot that shows how to use feedback from Automated Reasoning checks to iterate on the generated content, ask clarifying questions, and prove the correctness of an answer. The chatbot implementation also produces an audit log that includes mathematically verifiable explanations for the answer validity and a …

How PARTs Assemble into Wholes: Learning the Relative Composition of Images

The composition of objects and their parts, along with object-object positional relationships, provides a rich source of information for representation learning. Hence, spatial-aware pretext tasks have been actively explored in self-supervised learning. Existing works commonly start from a grid structure, where the goal of the pretext task involves predicting the absolute position index of patches …

jeffzeng

Structured outputs on Amazon Bedrock: Schema-compliant AI responses

Today, we’re announcing structured outputs on Amazon Bedrock—a capability that fundamentally transforms how you can obtain validated JSON responses from foundation models through constrained decoding for schema compliance. This represents a paradigm shift in AI application development. Instead of validating JSON responses and writing fallback logic for when they fail, you can move straight to …

Vertex AI Latency Comparisonpng5wmax 1000x1000 1

How we cut Vertex AI latency by 35% with GKE Inference Gateway

As generative AI moves from experimentation to production, platform engineers face a universal challenge for inference serving: you need low latency, high throughput, and manageable costs.  It is a difficult balance. Traffic patterns vary wildly, from complex coding tasks that require processing huge amounts of data, to quick, chatty conversations that demand instant replies. Standard …

image 1 13

How Associa transforms document classification with the GenAI IDP Accelerator and Amazon Bedrock

This is a guest post co-written with David Meredith and Josh Zacharias from Associa. Associa, North America’s largest community management company, oversees approximately 7.5 million homeowners with 15,000 employees across more than 300 branch offices. The company manages approximately 48 million documents across 26 TB of data, but their existing document management system lacks efficient …

shopify 5g2sFtzmax 1000x1000 1

Announcing Claude Opus 4.6 on Vertex AI

At Google Cloud, we’re committed to providing customers with the leading selection of models to build and scale production-ready AI apps and agents on a platform optimized for performance, trust, and global scale. Today, we’re further expanding Vertex AI’s curated collection of models with the addition of Anthropic’s newest release: Claude Opus 4.6. Claude Opus …

ML 20209 image 1

Accelerating your marketing ideation with generative AI – Part 2: Generate custom marketing images from historical references

Marketing teams face major challenges creating campaigns in today’s digital environment. They must navigate through complex data analytics and rapidly changing consumer preferences to produce engaging, personalized content across multiple channels while maintaining brand consistency and working within tight deadlines. Using generative AI can streamline and accelerate the creative process while maintaining alignment with business …