Our vision for building a universal AI assistant
We’re extending Gemini to become a world model that can make plans and imagine new experiences by simulating aspects of the world.
We’re extending Gemini to become a world model that can make plans and imagine new experiences by simulating aspects of the world.
With the increasing integration of speech front-ends and large language models (LLM), there is a need to explore architectures that integrate these modalities. While end-to-end models have been explored extensively, cascaded models that stream outputs from LLMs to TTS seem to be oddly under-explored, even though they are potentially much simpler. Using traditional text-to-speech systems …
Read more “SpeakStream: Streaming Text-to-Speech with Interleaved Data”
Emerging transformer-based vision models for geospatial data—also called geospatial foundation models (GeoFMs)—offer a new and powerful technology for mapping the earth’s surface at a continental scale, providing stakeholders with the tooling to detect and monitor surface-level ecosystem conditions such as forest degradation, natural disaster impact, crop yield, and many others. GeoFMs represent an emerging research …
Read more “Revolutionizing earth observation with geospatial foundation models on AWS”
Want to turn your generative AI ideas into real web applications with one click? Any developer knows it’s a complex process to build shareable, interactive applications: you have to set up infrastructure, wire APIs, and build a front-end. It’s usually a complex process. What if you could skip the heavy lifting and turn your generative …
Read more “Create shareable generative AI apps in less than 60 seconds with Vertex AI and Cloud Run”
There’s a strange loop taking over social media right now. Scroll through TikTok, YouTube Live, or Instagram, and you’ll see a parade of “digital marketing experts” promoting their latest PDF guide, online course, or coaching program. What’s it about? Digital marketing. But not the kind that helps actual businesses improve performance, it’s a course on …
Read more “Digital Marketing Courses to Sell Digital Marketing Courses”
Long chain-of-thought (CoT) significantly enhances large language models’ (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the ability …
Read more “Interleaved Reasoning for Large Language Models via Reinforcement Learning”
In the financial services industry, analysts need to switch between structured data (such as time-series pricing information), unstructured text (such as SEC filings and analyst reports), and audio/visual content (earnings calls and presentations). Each format requires different analytical approaches and specialized tools, creating workflow inefficiencies. Add on top of this the intense time pressure resulting …
Mixture-of-Experts (MoE) models are crucial for scaling model capacity while controlling inference costs. While integrating MoE into multimodal models like CLIP improves performance, training these models is notoriously challenging and expensive. We propose CLIP-Upcycling (CLIP-UP), an efficient alternative training strategy that converts a pre-trained dense CLIP model into a sparse MoE architecture. Through extensive experimentation …
Organizations across a wide range of industries are struggling to process massive amounts of unstructured video and audio content to support their core business applications and organizational priorities. Amazon Bedrock Data Automation helps them meet this challenge by streamlining application development and automating workflows that use content from documents, images, audio, and video. Recently, we …
Read more “New Amazon Bedrock Data Automation capabilities streamline video and audio analysis”
Heard of AI agents lately? We know many of you are itching to start building them! Here’s your chance with the Agent Development Kit Hackathon with Google Cloud. Everyone’s talking about AI agents, but the real magic happens when they collaborate to tackle complex tasks. Think: complex processes, data analysis, content creation, and customer support. …