Categories: Image

SAM Audio: the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts

SAM-Audio is a foundation model for isolating any sound in audio using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures based on natural language descriptions, visual cues from video, or time spans.

https://ai.meta.com/samaudio/

https://huggingface.co/collections/facebook/sam-audio

https://github.com/facebookresearch/sam-audio

submitted by /u/fruesome
[link] [comments]

Announcing Stable Audio, a product for music & sound generation

September 14, 2023

In "Image"

AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR

June 3, 2023

In "FAANG"

Build live voice-driven agentic applications with Vertex AI Gemini Live API

May 6, 2025

In "FAANG"

AI Generated Robotic Content

Next We may never be able to tell if AI becomes conscious, argues philosopher »

Previous « GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

Published by

AI Generated Robotic Content

Tags: ai images

3 months ago

Let’s Destroy the E-THOT Industry Together!

I created a completely local Ethot online as an experiment. I dream of a world…

22 hours ago

AI/ML Research

Vector Databases Explained in 3 Levels of Difficulty

Traditional databases answer a well-defined question: does the record matching these criteria exist?

22 hours ago

FAANG

Drop-In Perceptual Optimization for 3D Gaussian Splatting

Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often…

22 hours ago

FAANG

Frontend Engineering at Palantir: Redefining Real-Time Map Collaboration

How we built lightweight, real-time map collaboration for teams operating at the edge.About This SeriesFrontend engineering at…

22 hours ago

FAANG

Run Generative AI inference with Amazon Bedrock in Asia Pacific (New Zealand)

Kia ora! Customers in New Zealand have been asking for access to foundation models (FMs)…

22 hours ago

FAANG

The new AI literacy: Insights from student developers

AI has made it easier than ever for student developers to work efficiently, tackle harder…

22 hours ago

SAM Audio: the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts

Recent Posts