
Summary Box [In a hurry? Just read thisâš¡]
- ACE-Step v1.5 is a brand-new open-source music foundation model that generates full songs extremely fast – often in under 2 seconds on high-end GPUs or ~10 seconds on an RTX 3090.
- It uses less than 4GB VRAM, making it runnable on consumer hardware while delivering audio quality that frequently beats or matches expensive commercial tools like Suno v5 and Udio v1.5.
- Key strengths include blazing generation speed (up to 10–120× faster than competitors), support for 50+ languages, and powerful editing features like vocal-to-instrumental conversion, section repainting, and cover creation.
- You can train a personal LoRA adapter with just a few of your own tracks to capture your unique musical style and preferences.
- The model combines a Language Model for song planning + lyrics, a Diffusion Transformer for audio synthesis, and internal reinforcement learning to keep outputs clean and unbiased – all fully open-source and free to download/use.
This one is for music fans, ACE-Step v1.5 just dropped, and it’s shaking things up in the world of AI-generated tunes.
This open-source music foundation model from the ACE-Step team is all about making pro-level song creation super accessible, running smoothly on everyday GPUs without guzzling tons of power or memory.
Topics
ToggleImagine whipping up a full track in under two seconds on high-end hardware or about 10 seconds on something like an RTX 3090, that’s the kind of zippy performance we’re talking about here.
What makes this model cool is how it punches above its weight. It scores big on quality metrics, often outdoing pricey commercial options, all while keeping things lightweight with less than 4GB VRAM needed.
Why ACE-Step v1.5 Stands Out: Key Perks

Let’s break down what sets this model apart in a simple list:
- Blazing Fast Generation: Cranks out complete songs super quick: think seconds, not minutes.
- Personalized Tweaks: Train your own LoRA adapter with just a handful of tracks to nail your unique style.
- Hybrid Tech Magic: Blends a Language Model for planning with a Diffusion Transformer for smooth audio creation, plus internal reinforcement learning to keep things bias-free.
- Multi-Language Vibes: Handles over 50 languages, so you can generate tracks in English, Chinese, or whatever floats your boat.
- Editing Superpowers: Go beyond basics with features like making covers, repainting sections, or converting vocals to background music.
These bits make it versatile for everything from casual jams to more polished productions.
How It Stacks Up Against the Competition
🚀 ACE-Step v1.5 is out: an open-source music foundation model that runs locally on consumer GPUs (<4GB VRAM) and generates full songs in <2s (A100) or <10s (RTX 3090).
— ModelScope (@ModelScope2022) February 4, 2026
✅ Beats most commercial models in quality
✅ Train a personalized LoRA from just a few tracks
✅ Built on a… pic.twitter.com/32EqQirFBO
Curious about the numbers? Here’s a rundown of how ACE-Step v1.5 compares to other popular models on key audio quality metrics like content embedding, coherence, musicality, and more.
Higher scores are better across the board, and check that generation speed, it’s a game-changer.
| Model | CE | CU | PC | PQ | Coh. | Mus. | Cla. | Nat. | Style Align | Lyric Align | Generation Speed (min/song on A100) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Udio-v1.5 | 7.45 | 7.65 | 6.15 | 8.03 | 4.15 | 3.96 | 4.09 | 3.93 | 3.86 | 34.9 | 4 min |
| Suno-v4.5 | 7.63 | 7.85 | 6.22 | 8.25 | 4.64 | 4.51 | 4.63 | 4.53 | 4.49 | 40.5 | 3.2 min |
| Suno-v5 | 7.69 | 7.87 | 6.51 | 8.29 | 4.72 | 4.62 | 4.71 | 4.63 | 4.56 | 46.8 | 3.4 min |
| Mureka-v7.6 | 7.44 | 7.71 | 6.35 | 8.13 | 4.33 | 4.29 | 4.35 | 4.29 | 4.21 | 36.2 | 2.2 min |
| MinMax 2.0 | 7.71 | 7.95 | 6.42 | 8.38 | 4.61 | 4.51 | 4.59 | 4.50 | 4.41 | 43.1 | 2.9 min |
| Yue | 6.58 | 7.29 | 4.95 | 7.39 | 3.01 | 2.80 | 2.85 | 2.79 | 2.82 | 26.8 | -4.6 min |
| ACE-Step 1.0 | 7.22 | 7.52 | 6.50 | 7.76 | 3.99 | 3.73 | 3.85 | 3.78 | 3.68 | 28.5 | 0.9 min |
| LeVo | 7.61 | 7.78 | 5.92 | 8.31 | 3.55 | 3.35 | 3.32 | 3.31 | 3.20 | 29.4 | -1.2 min |
| DiffRhythm 2 | 7.25 | 7.61 | 6.33 | 7.99 | 3.98 | 3.79 | 3.97 | 3.82 | 3.66 | 32.1 | 3.8 min |
| HeartMuLa | 7.66 | 7.89 | 6.15 | 8.25 | 4.68 | 4.55 | 4.69 | 4.55 | 4.45 | 31.7 | 2.8 min |
| ACE-Step 1.5 | 7.42 | 8.09 | 6.47 | 8.35 | 4.72 | 4.67 | 4.72 | 4.66 | 4.59 | 39.1 | 2.6 min |
As you can see, ACE-Step v1.5 holds its own or leads in several areas, especially when you factor in that insane speed boost – up to 10-120x faster than some rivals.
Behind the Scenes: What Powers This Beast
At its heart, ACE-Step v1.5 uses a clever mix of tech. The Language Model acts like a smart planner, turning your text prompts into detailed song blueprints, complete with lyrics and structure.
Then the Diffusion Transformer takes over to generate the actual audio.

Plus, it uses self-reinforcing learning inside the model itself, skipping external biases for cleaner results. This setup lets you handle long tracks up to 10 minutes and tweak styles on the fly.
If you’re into customizing, the LoRA training is a breeze, feed it a few of your favorite songs, and boom, it adapts to your vibe.
And with support for editing tricks like repainting parts of a track or swapping vocals for instrumentals, it’s not just a generator; it’s a full-on music workshop.
Wrapping It Up: Why Give It a Spin?
In a nutshell, ACE-Step v1.5 is making waves by democratizing high-quality music AI. It’s free to grab from spots like Hugging Face or ModelScope, and the demo spaces let you test it out without setup hassles.
For anyone dabbling in music creation, this could speed up your workflow big time, whether you’re remixing hits or dreaming up originals. Dive in and see what tunes you can cook up!



