SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models

With the rapid expansion in the scale of large language models (LLMs), enabling efficient distributed inference across multiple computing units has become increasingly critical. However, communication overheads from popular distributed inference techniques such as Tensor Parallelism pose a significant challenge to achieve scalability and low latency. Therefore, we introduce a novel optimization technique, Sync-Point Drop …

Picture1 3

Optimize query responses with user feedback using Amazon Bedrock embedding and few-shot prompting

Improving response quality for user queries is essential for AI-driven applications, especially those focusing on user satisfaction. For example, an HR chat-based assistant should strictly follow company policies and respond using a certain tone. A deviation from that can be corrected by feedback from users. This post demonstrates how Amazon Bedrock, combined with a user …

Announcing Anthropic’s Claude Opus 4 and Claude Sonnet 4 on Vertex AI

Today, we’re expanding the choice of third-party models available in Vertex AI Model Garden with the addition of Anthropic’s newest generation of the Claude model family: Claude Opus 4 and Claude Sonnet 4. Both Claude Opus 4 and Claude Sonnet 4 are hybrid reasoning models, meaning they offer modes for near-instant responses and extended thinking …

Could AI understand emotions better than we do?

Is artificial intelligence (AI) capable of suggesting appropriate behavior in emotionally charged situations? A team put six generative AIs — including ChatGPT — to the test using emotional intelligence (EI) assessments typically designed for humans. The outcome: these AIs outperformed average human performance and were even able to generate new tests in record time. These …

Humanoid Policy ~ Human Policy

Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot …