Categories: Image

Nava – A 6.3B audio-video model .

Page: https://ernie-research.github.io/NAVA/
Model: https://huggingface.co/ernie-research/NAVA
Github: https://github.com/ernie-research/NAVA

NAVA is a 6.3 B-parameter joint audio-video generator that synthesizes synchronized video and audio from a single prompt — including multi-speaker speech with reference-timbre control and image-conditioned continuations.

Instead of post-hoc-aligned dual towers or fully unified tri-modal stacks, NAVA uses an Align-then-Fuse MMDiT: a dedicated alignment space first establishes audio-video correspondence, then context (text, speaker embeddings) is fused via cross-attention. On Verse-Bench it sets new SOTA on Sync-C / Sync-D / video quality / audio WER while using 2× to 5× fewer parameters than open-source baselines.

submitted by /u/AgeNo5351
[link] [comments]

AI Generated Robotic Content

Share
Published by
AI Generated Robotic Content
Tags: ai images

Recent Posts

Enterprise Business Software and the Mixed-Up Chameleon Problem

Editor’s Note: This blog post was written by Greg Little, Senior Counselor at Palantir, with…

50 mins ago

High-Throughput Graph Abstraction at Netflix: Part I

By Oleksii Tkachuk, Kartik Sathyanarayanan, Rajiv ShringiIntroductionNetflix has a diverse range of graph use cases, each…

50 mins ago

Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality

Deploying large language models (LLMs) at scale on Amazon SageMaker AI Inference makes observability a…

50 mins ago

Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

Welcome to the second Cloud CISO Perspectives for May 2026. Today, Usman Chaudhary, Field CISO,…

50 mins ago

24 Best Father’s Day Gifts for Dads (2026)

Dads are traditionally tough to shop for—let me help with these handpicked gift ideas for…

2 hours ago

Misbehaving chatbots could be kept in check with personality tests

Artificial intelligence chatbots need to work on their social judgment, recent events suggest. At one…

2 hours ago