Categories: FAANG

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

We introduce SlowFast-LLaVA-1.5 (abbreviated as SF-LLaVA-1.5), a family of video large language models (LLMs) offering a token-efficient solution for long-form video understanding. We incorporate the two-stream SlowFast mechanism into a streamlined training pipeline, and perform joint video-image training on a carefully curated data mixture of only publicly available datasets. Our primary focus is on highly efficient model scales (1B and 3B), demonstrating that even relatively small Video LLMs can achieve state-of-the-art performance on video understanding, meeting the demand for…
AI Generated Robotic Content

Recent Posts

Just tried animating a Pokémon TCG card with AI – Wan 2.2 blew my mind

Hey folks, I’ve been playing around with animating Pokémon cards, just for fun. Honestly I…

5 hours ago

Busted by the em dash — AI’s favorite punctuation mark, and how it’s blowing your cover

AI is brilliant at polishing and rephrasing. But like a child with glitter glue, you…

6 hours ago

Scientists Have Identified the Origin of an Extraordinarily Powerful Outer Space Radio Wave

In March 2025 the Earth was hit by a fast radio burst as energetic as…

6 hours ago

Robots can now learn to use tools—just by watching us

Despite decades of progress, most robots are still programmed for specific, repetitive tasks. They struggle…

6 hours ago

Sharing that workflow [Remake Attempt]

I took a stab at recreating that person's work but including a workflow. Workflow download…

1 day ago

Enhance Geospatial Analysis and GIS Workflows with Amazon Bedrock Capabilities

As data becomes more abundant and information systems grow in complexity, stakeholders need solutions that…

1 day ago