The Super Weight in Large Language Models

Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: …

Driving Content Delivery Efficiency Through Classifying Cache Misses

By Vipul Marlecha, Lara Deek, Thiara Ortiz The mission of Open Connect, our dedicated content delivery network (CDN), is to deliver the best quality of experience (QoE) to our members. By localizing our Open Connect Appliances (OCAs), we bring Netflix content closer to the end user. This is achieved through close partnerships with internet service providers …

rag sm arch

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Generative AI has revolutionized customer interactions across industries by offering personalized, intuitive experiences powered by unprecedented access to information. This transformation is further enhanced by Retrieval Augmented Generation (RAG), a technique that allows large language models (LLMs) to reference external knowledge sources beyond their training data. RAG has gained popularity for its ability to improve …

A guide to converting ADK agents with MCP to the A2A framework

The evolution of AI agents has led to powerful, specialized models capable of complex tasks. The Google Agent Development Kit (ADK) – a toolkit designed to simplify the construction and management of language model-based applications – makes it easy for developers to build agents, usually equipped with tools via the Model Context Protocol (MCP) for …

1 Soultion Architecture image 1

Use Amazon SageMaker Unified Studio to build complex AI workflows using Amazon Bedrock Flows

Organizations face the challenge to manage data, multiple artificial intelligence and machine learning (AI/ML) tools, and workflows across different environments, impacting productivity and governance. A unified development environment consolidates data processing, model development, and AI application deployment into a single system. This integration streamlines workflows, enhances collaboration, and accelerates AI solution development from concept to …

1H IuNbF hddtlS85U4kqFA

From R&D to Real-World Impact

Palantir’s Advice for the White House OSTP’s AI R&D Plan Editor’s Note: This blog post highlights Palantir’s response to a Request for Information regarding the redrafting of the National AI R&D Plan. For more information on Palantir’s contributions to AI policy, visit our website here. Improving the Nation’s AI R&D Plan As the AI race evolves, so too …

ML 17750 image 1

Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

Amazon SageMaker Inference has been a popular tool for deploying advanced machine learning (ML) and generative AI models at scale. As AI applications become increasingly complex, customers want to deploy multiple models in a coordinated group that collectively process inference requests for an application. In addition, with the evolution of generative AI applications, many use …

1 AI x Web3 Landscapemax 1000x1000 1

How to build Web3 AI agents with Google Cloud

For over two decades, Google has been a pioneer in AI, conducting groundwork that has shaped the industry. Concurrently, in the Web3 space, Google focuses on empowering the developer community by providing public goods resources like BigQuery blockchain datasets and testnet faucets, as well as the cloud infrastructure builders will need to bring their decentralized …

Evaluating Long Range Dependency Handling in Code Generation LLMs

As language models support larger and larger context sizes, evaluating their ability to make effective use of that context becomes increasingly important. We analyze the ability of several code generation models to handle long range dependencies using a suite of multi-step key retrieval tasks in context windows up to 8k tokens in length. The tasks …

mcpinfo

AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP

Managing and optimizing AWS infrastructure costs is a critical challenge for organizations of all sizes. Traditional cost analysis approaches often involve the following: Complex spreadsheets – Creating and maintaining detailed cost models, which requires significant effort Multiple tools – Switching between the AWS Pricing Calculator, AWS Cost Explorer, and third-party tools Specialized knowledge – Understanding …