Zelili AI

SAM Audio

The first unified model that isolates any sound from complex audio mixtures.
Founder: Meta Fundamental AI Research (FAIR) Team (Led by Bowen Shi, Andros Tjandra, et al.)
Tool Release Date
Dec 2025
Tool Users
500K+
Pricing Model

Starting Price

$0/Month

About This AI

SAM Audio is Meta’s latest addition to the “Segment Anything” family, bringing the “cut-and-paste” revolution from images to sound. It is a unified foundation model designed to isolate specific audio sources from complex, noisy mixtures.

Unlike traditional audio tools that are specialized (e.g., only for voice or only for drums), SAM Audio is a generalist: you can ask it to “isolate the dog barking,” click on a guitar in a video to extract its sound, or highlight a specific timeframe to separate that event. It effectively acts as a “Magic Wand” for audio, allowing for precise editing and noise removal across speech, music, and environmental sounds.

Pricing

Pricing Model

Starting Price

$0/Month

Key Features

  1. Text Prompting: Isolate specific sounds by simply typing natural language descriptions like "glass breaking" or "female vocals."
  2. Visual Prompting: In video files, users can click on a visual object (e.g., a passing car) to automatically isolate the sound that object is making.
  3. Span Prompting (Time): Allows users to mark a specific time range on the timeline to target and extract the sound occurring in that window.
  4. Unified Architecture: A single model that handles speech enhancement, music source separation, and sound effect isolation without needing separate tools.
  5. Zero-Shot Generalization: Capable of separating sounds it has never explicitly been trained on, thanks to its massive training dataset.
  6. Residual Extraction: Automatically generates two tracks: the isolated target sound and the "residual" (everything else), perfect for removing noise while keeping the background.

Pros

  1. The first tool to successfully combine text, visual, and time prompts for audio.
  2. Completely free and open-source (available on Hugging Face).
  3. Solves complex "cocktail party" problems where multiple sounds overlap.
  4. Visual prompting is a game-changer for video editors.
  5. High fidelity separation for music stems.

Cons

  1. Requires technical knowledge (Python/CLI) to run locally; no official polished app yet.
  2. Can still struggle with highly overlapping voices (e.g., two people talking over each other).
  3. High hardware requirements (GPU) for efficient inference.
Best for Audio engineers, video editors, researchers, and developers looking to build advanced audio cleaning or editing tools.

FAQs

  • Is SAM Audio free?

    Yes, Meta has released the code and model weights under a permissive license (SAM License) for research and commercial use. You can download it for free from GitHub.

  • Can SAM Audio remove background noise?

    Yes, by prompting it to isolate the “speech” or “voice,” it will separate the vocals from the background noise, effectively acting as a powerful noise canceler. Alternatively, you can select the noise (e.g., “siren”) to remove just that element.

  • Does it work on videos?

    Yes, SAM Audio is multimodal. If you provide a video file, you can use the “Visual Prompt” feature to click on objects in the video frame to help the AI identify which sound to isolate.

  • How is this different from tools like Lalal.ai?

    Tools like Lalal.ai are typically specialized for music (separating drums, bass, vocals). SAM Audio is a generalist that can separate anything—from a cat meowing to a car door slamming—using text descriptions, which most music tools cannot do.

SAM Audio Alternatives

Scribe V2

Chatterbox Turbo

TurboScribe

Newly Added

Autodraft AI

GlimpRouter

Weekly Poll

SAM Audio Review

0.0
0.0 out of 5 stars (based on 0 reviews)
Excellent0%
Very good0%
Average0%
Poor0%
Terrible0%

There are no reviews yet. Be the first one to write one.

Newly Added Tools

Autodraft AI

GlimpRouter

Flux.2 Dev Turbo

GLM-Image