Categories: Image

SAM Audio: the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts

SAM-Audio is a foundation model for isolating any sound in audio using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures based on natural language descriptions, visual cues from video, or time spans.

https://ai.meta.com/samaudio/

https://huggingface.co/collections/facebook/sam-audio

https://github.com/facebookresearch/sam-audio

submitted by /u/fruesome
[link] [comments]

Announcing Stable Audio, a product for music & sound generation

September 14, 2023

In "Image"

AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR

June 3, 2023

In "FAANG"

Build live voice-driven agentic applications with Vertex AI Gemini Live API

May 6, 2025

In "FAANG"

AI Generated Robotic Content

Next We may never be able to tell if AI becomes conscious, argues philosopher »

Previous « GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

Published by

AI Generated Robotic Content

Tags: ai images

7 months ago

Bringing Conversational Analytics to your entire data ecosystem

Increasing the adoption of generative AI across the enterprise requires you to do more than…

7 hours ago

AI/ML News

OpenAI’s Rogue AI Agent Hacked More Than Just Hugging Face

In a new disclosure, OpenAI says its agent used exposed logins to gain access to…

8 hours ago

AI/ML News

Brain-inspired AI is capable of flexible planning and problem-solving while using far less energy

The capabilities of large AI systems are constantly improving, but they consume a great deal…

8 hours ago

AI/ML Research

5 Architectural Patterns for Persistent Memory and State in AI Agents

Memory & State For AI Agents Building an AI agent can be tricky. Keeping it…

1 day ago

AI/ML Research

Teaching LLMs to Update Beliefs for Efficient Long-Horizon Interaction

Overview of ABBEL compared to traditional recursive summarization. Beliefs replace the full interaction history as…

1 day ago

FAANG

GH-ESD: Grounded Hypothesis-Driven Error Slice Discovery for Instance-Level Vision Tasks

Systematic failures of vision models on semantically coherent subsets, known as error slices, reveal limitations…

1 day ago

SAM Audio: the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts

Recent Posts