Categories: FAANG

Video generation models as world simulators

We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.
AI Generated Robotic Content

Recent Posts

I’m working on a film about Batman (1989) vs Jurassic Park (1993)

submitted by /u/Many-Ad-6225 [link] [comments]

19 hours ago

10 NumPy One-Liners to Simplify Feature Engineering

When building machine learning models, most developers focus on model architectures and hyperparameter tuning.

19 hours ago

Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions

Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models…

19 hours ago

Accelerate AI development with Amazon Bedrock API keys

Today, we’re excited to announce a significant improvement to the developer experience of Amazon Bedrock:…

19 hours ago

Accelerate your AI workloads with the Google Cloud Managed Lustre

Today, we're making it even easier to achieve breakthrough performance for your AI/ML workloads: Google…

19 hours ago

MCP isn’t KYC-ready: Why regulated sectors are wary of open agent exchanges

Model Context Protocol, or MCP, is gaining momentum. But, not everyone is fully onboard yet,…

20 hours ago