On the Benefits of Pixel-Based Hierarchical Policies for Task Generalization

Reinforcement learning practitioners often avoid hierarchical policies, especially in image-based observation spaces. Typically, the single-task performance improvement over flat-policy counterparts does not justify the additional complexity associated with implementing a hierarchy. However, by introducing multiple decision-making levels, hierarchical policies can compose lower-level policies to more effectively generalize between tasks, highlighting the need for multi-task evaluations. …

ML 17050 image001

Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation

Amazon Q Business is a conversational assistant powered by generative artificial intelligence (AI) that enhances workforce productivity by answering questions and completing tasks based on information in your enterprise systems, which each user is authorized to access. In an earlier post, we discussed how you can build private and secure enterprise generative AI applications with …

modelgarden ai21 walkthrough

Announcing the Jamba 1.5 Model Family from AI21 Labs on Vertex AI

Today, we’re announcing the launch of the Jamba 1.5 Model Family  — AI21 Labs’ new family of open models — in public preview on Vertex AI Model Garden. The model family includes two models designed for scaled enterprise applications:   Jamba 1.5 Mini: AI21’s most efficient and lightweight model, engineered for speed and efficiency in tasks …

ML 17387 image001

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Today, we are excited to announce general availability of batch inference for Amazon Bedrock. This new feature enables organizations to process large volumes of data when interacting with foundation models (FMs), addressing a critical need in various industries, including call center operations. Call center transcript summarization has become an essential task for businesses seeking to …

GPU blog gif 2

Run your AI inference applications on Cloud Run with NVIDIA GPUs

Developers love Cloud Run for its simplicity, fast autoscaling, scale-to-zero capabilities, and pay-per-use pricing. Those same benefits come into play for real-time inference apps serving open gen AI models. That’s why today, we’re adding support for NVIDIA L4 GPUs to Cloud Run, in preview. This opens the door to many new use cases to Cloud …

Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy

Developers of generative AI typically face a tradeoff between model size and accuracy. But a new language model released by NVIDIA delivers the best of both, providing state-of-the-art accuracy in a compact form factor. Mistral-NeMo-Minitron 8B — a miniaturized version of the open Mistral NeMo 12B model released by Mistral AI and NVIDIA last month …

ml 17291 image001

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Amazon SageMaker Canvas is a low-code no-code visual interface to build and deploy ML models without the need to write code. Based on customers’ feedback, …

Can You Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-banks as inputs, and thus, training them on self-supervised features assumes that both feature types require the same amount of learning for the task. In this work, we observe that pre-trained self-supervised speech …

ML 17331 arch diagram

Cohere Rerank 3 Nimble now generally available on Amazon SageMaker JumpStart

The Cohere Rerank 3 Nimble foundation model (FM) is now generally available in Amazon SageMaker JumpStart. This model is the newest FM in Cohere’s Rerank model series, built to enhance enterprise search and Retrieval Augmented Generation (RAG) systems. In this post, we discuss the benefits and capabilities of this new model with some examples. Overview …

RepCNN: Micro-Sized, Mighty Models for Wakeword Detection

Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model’s capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger …