Categories: FAANG

Video generation models as world simulators

We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.
AI Generated Robotic Content

Recent Posts

10 Podcasts That Every Machine Learning Enthusiast Should Subscribe To

Podcasts are a fun and easy way to learn about machine learning.

30 mins ago

o1’s Thoughts on LNMs and LMMs

TL;DR We asked o1 to share its thoughts on our recent LNM/LMM post. https://www.artificial-intelligence.show/the-ai-podcast/o1s-thoughts-on-lnms-and-lmms What…

30 mins ago

Leading Federal IT Innovation

Palantir and Grafana Labs’ Strategic PartnershipIntroductionIn today’s rapidly evolving technological landscape, government agencies face the…

30 mins ago

How Amazon trains sequential ensemble models at scale with Amazon SageMaker Pipelines

Amazon SageMaker Pipelines includes features that allow you to streamline and automate machine learning (ML)…

30 mins ago

Orchestrating GPU-based distributed training workloads on AI Hypercomputer

When it comes to AI, large language models (LLMs) and machine learning (ML) are taking…

30 mins ago

Cohere’s smallest, fastest R-series model excels at RAG, reasoning in 23 languages

Cohere's Command R7B uses RAG, features a context length of 128K, supports 23 languages and…

2 hours ago