Categories: FAANG

Video generation models as world simulators

We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.
AI Generated Robotic Content

Recent Posts

Pirate VFX Breakdown | Made almost exclusively with SDXL and Wan!

In the past weeks, I've been tweaking Wan to get really good at video inpainting.…

8 hours ago

Try Deep Think in the Gemini app

Deep Think utilizes extended, parallel thinking and novel reinforcement learning techniques for significantly improved problem-solving.

8 hours ago

Introducing Amazon Bedrock AgentCore Browser Tool

At AWS Summit New York City 2025, Amazon Web Services (AWS) announced the preview of…

8 hours ago

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Cohere's Command A Vision can read graphs and PDFs to make enterprise research richer and…

9 hours ago

Anthropic Revokes OpenAI’s Access to Claude

OpenAI lost access to the Claude API this week after Anthropic claimed the company was…

9 hours ago

New AI tool learns to read medical images with far less data

A new artificial intelligence (AI) tool could make it much easier—and cheaper—for doctors and researchers…

9 hours ago