FastVLM: Efficient Vision encoding for Vision Language Models

Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency. At different operational resolutions, the vision encoder of a VLM …

1 SolutionsOverview FinOps AmazonBedrock MultiAgent

Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model

AI agents are revolutionizing how businesses enhance their operational capabilities and enterprise applications. By enabling natural language interactions, these agents provide customers with a streamlined, personalized experience. Amazon Bedrock Agents uses the capabilities of foundation models (FMs), combining them with APIs and data to process user requests, gather information, and execute specific tasks effectively. The …

Disentangled Representational Learning with the Gromov-Monge Gap

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging …

1 d2vhHwZArF0dXynoMDT6w

Palantir’s Blueprint for Early Career Success in Product Design

Editor’s Note: Product Designers are key members of Palantir product teams. This blog post features a banner by Product Designer Sarah, a self-reflection by Product Designer Phoebe on navigating her early career, and a Q&A with design colleagues. Insights from Palantir Product Design Phoebe, Product Designer I’m still figuring out what kind of designer I want to …

Zoom as data accessor white

Add Zoom as a data accessor to your Amazon Q index

For many organizations, vast amounts of enterprise knowledge are scattered across diverse data sources and applications. Organizations across industries seek to use this cross-application enterprise data from within their preferred systems while adhering to their established security and governance standards. This post demonstrates how Zoom users can access their Amazon Q Business enterprise data directly …

Scaling Laws for Native Multimodal Models

Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion architectures are inherently superior. In …

blog updatedrawio

Automate Amazon EKS troubleshooting using an Amazon Bedrock agentic workflow

As organizations scale their Amazon Elastic Kubernetes Service (Amazon EKS) deployments, platform administrators face increasing challenges in efficiently managing multi-tenant clusters. Tasks such as investigating pod failures, addressing resource constraints, and resolving misconfiguration can consume significant time and effort. Instead of spending valuable engineering hours manually parsing logs, tracking metrics, and implementing fixes, teams should …

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025. Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus on either the person …