SpeakStream: Streaming Text-to-Speech with Interleaved Data

With the increasing integration of speech front-ends and large language models (LLM), there is a need to explore architectures that integrate these modalities. While end-to-end models have been explored extensively, cascaded models that stream outputs from LLMs to TTS seem to be oddly under-explored, even though they are potentially much simpler. Using traditional text-to-speech systems …

ML 18209 arch diag

Revolutionizing earth observation with geospatial foundation models on AWS

Emerging transformer-based vision models for geospatial data—also called geospatial foundation models (GeoFMs)—offer a new and powerful technology for mapping the earth’s surface at a continental scale, providing stakeholders with the tooling to detect and monitor surface-level ecosystem conditions such as forest degradation, natural disaster impact, crop yield, and many others. GeoFMs represent an emerging research …

image2 NDI3aWS

Create shareable generative AI apps in less than 60 seconds with Vertex AI and Cloud Run

Want to turn your generative AI ideas into real web applications with one click?  Any developer knows it’s a complex process to build shareable, interactive applications: you have to set up infrastructure, wire APIs, and build a front-end. It’s usually a complex process. What if you could skip the heavy lifting and turn your generative …

Digital Marketing Courses to Sell Digital Marketing Courses

There’s a strange loop taking over social media right now. Scroll through TikTok, YouTube Live, or Instagram, and you’ll see a parade of “digital marketing experts” promoting their latest PDF guide, online course, or coaching program. What’s it about? Digital marketing. But not the kind that helps actual businesses improve performance, it’s a course on …

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Long chain-of-thought (CoT) significantly enhances large language models’ (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the ability …

multi agent 1

Part 3: Building an AI-powered assistant for investment research with multi-agent collaboration in Amazon Bedrock and Amazon Bedrock Data Automation

In the financial services industry, analysts need to switch between structured data (such as time-series pricing information), unstructured text (such as SEC filings and analyst reports), and audio/visual content (earnings calls and presentations). Each format requires different analytical approaches and specialized tools, creating workflow inefficiencies. Add on top of this the intense time pressure resulting …

CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling

Mixture-of-Experts (MoE) models are crucial for scaling model capacity while controlling inference costs. While integrating MoE into multimodal models like CLIP improves performance, training these models is notoriously challenging and expensive. We propose CLIP-Upcycling (CLIP-UP), an efficient alternative training strategy that converts a pre-trained dense CLIP model into a sparse MoE architecture. Through extensive experimentation …

air logo 1

New Amazon Bedrock Data Automation capabilities streamline video and audio analysis

Organizations across a wide range of industries are struggling to process massive amounts of unstructured video and audio content to support their core business applications and organizational priorities. Amazon Bedrock Data Automation helps them meet this challenge by streamlining application development and automating workflows that use content from documents, images, audio, and video. Recently, we …

1 adk componentsmax 1000x1000 1

Calling all devs: Build multi-agent systems in the Agent Development Kit Hackathon with Google Cloud

Heard of AI agents lately? We know many of you are itching to start building them! Here’s your chance with the Agent Development Kit Hackathon with Google Cloud.  Everyone’s talking about AI agents, but the real magic happens when they collaborate to tackle complex tasks. Think: complex processes, data analysis, content creation, and customer support. …