Combining next-token prediction and video diffusion in computer vision and robotics

In the current AI zeitgeist, sequence models have skyrocketed in popularity for their ability to analyze data and predict what to do next. For instance, you’ve likely used next-token prediction models like ChatGPT, which anticipate each word (token) in a sequence to form answers to users’ queries. There are also full-sequence diffusion models like Sora, …

Scalable Private Search with Wally

This paper presents Wally, a private search system that supports efficient semantic and keyword search queries against large databases. When sufficiently many clients are making queries, Wally’s performance is significantly better than previous systems. In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database …

dpg architecture

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

This post was co-written with Lucas Desard, Tom Lauwers, and Sam Landuydt from DPG Media. DPG Media is a leading media company in Benelux operating multiple online platforms and TV channels. DPG Media’s VTM GO platform alone offers over 500 days of non-stop content. With a growing library of long-form video content, DPG Media recognizes …

Beyond the basics: Build real-world gen AI skills with the latest learning paths from Google Cloud

The majority of organizations don’t feel ready for the AI era. In fact, 62% say they don’t have the expertise they need to unlock AI’s full potential.1 As the leader of learning for Google Cloud, the only thing that surprises me about that number is how low it is. I meet with customers every day, …

AI-driven video analyzer sets new standards in human action detection

What if a security camera could not only capture video but understand what’s happening—distinguishing between routine activities and potentially dangerous behavior in real time? That’s the future being shaped by researchers at the University of Virginia’s School of Engineering and Applied Science with their latest breakthrough: an AI-driven intelligent video analyzer capable of detecting human …

CAMPHOR: Collaborative Agents for Multi-Input Planning and High-Order Reasoning On Device

While server-side Large Language Models (LLMs) demonstrate proficiency in tool integration and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings opportunities to improve latency and privacy but also introduces unique challenges for accuracy and memory. We introduce CAMPHOR, an innovative on-device SLM multi-agent framework designed to handle multiple user inputs and reason …