Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and …

agent blog 1

Implement human-in-the-loop confirmation with Amazon Bedrock Agents

Agents are revolutionizing how businesses automate complex workflows and decision-making processes. Amazon Bedrock Agents helps you accelerate generative AI application development by orchestrating multi-step tasks. Agents use the reasoning capability of foundation models (FMs) to break down user-requested tasks into multiple steps. In addition, they use the developer-provided instruction to create an orchestration plan and …

image1 jvsf74lmax 1000x1000 1

Delivering an application-centric, AI-powered cloud for developers and operators

Today we’re unveiling new AI capabilities to help cloud developers and operators at every step of the application lifecycle. We are doing this by: Putting applications at the center of your cloud experience, abstracting away the infrastructure complexities of the traditional cloud model. Now you can design, observe, secure, and optimize at the application level, …

Do LLMs Estimate Uncertainty Well in Instruction-Following?

Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs’ instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs’ uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, …

How Netflix Accurately Attributes eBPF Flow Logs

By Cheng Xie, Bryan Shultz, and Christine Xu In a previous blog post, we described how Netflix uses eBPF to capture TCP flow logs at scale for enhanced network insights. In this post, we delve deeper into how Netflix solved a core problem: accurately attributing flow IP addresses to workload identities. A Brief Recap FlowExporter is …

ifood 4

How iFood built a platform to run hundreds of machine learning models with Amazon SageMaker Inference

Headquartered in São Paulo, Brazil, iFood is a national private company and the leader in food-tech in Latin America, processing millions of orders monthly. iFood has stood out for its strategy of incorporating cutting-edge technology into its operations. With the support of AWS, iFood has developed a robust machine learning (ML) inference infrastructure, using services …

Apple Workshop on Natural Language Understanding 2024

Progress in natural language processing enables more intuitive ways of interacting with technology. For example, many of Apple’s products and services, including Siri and search, use natural language understanding and generation to enable a fluent and seamless interface experience for users. Natural language is a rapidly moving area of machine learning research, and includes work …

ML 18512 image001

Llama 4 family of models from Meta are now available in SageMaker JumpStart

Today, we’re excited to announce the availability of Llama 4 Scout and Maverick models in Amazon SageMaker JumpStart and coming soon in Amazon Bedrock. Llama 4 represents Meta’s most advanced multimodal models to date, featuring a mixture of experts (MoE) architecture and context window support up to 10 million tokens. With native multimodality and early fusion …

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Large Language Models (LLMs) have transformed natural language processing, but face significant challenges in widespread deployment due to their high runtime cost. In this paper, we introduce SeedLM, a novel post-training compression method that uses seeds of a pseudo-random generator to encode and compress model weights. Specifically, for each block of weights, we find a …

claudio

Prompting for the best price-performance

In the drive to remain competitive, businesses today are turning to AI to help them minimize cost and maximize efficiency. It’s incumbent on them to find the most suitable AI model—the one that will help them achieve more while spending less. For many businesses, the migration from OpenAI’s model family to Amazon Nova represents not …