What is TTS-VC-Flash?
TTS-VC-Flash is Qwen3-TTS’s voice cloning model from Alibaba, enabling high-fidelity speech generation from just 3 seconds of reference audio in 10 languages.
When was TTS-VC-Flash released?
It was introduced on December 22, 2025, as part of the Qwen3-TTS family updates, with demos and API access available shortly after.
How many languages does TTS-VC-Flash support?
It supports voice cloning and generation in 10 languages: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian.
Is TTS-VC-Flash free to use?
Free demos available on Hugging Face/ModelScope; full API usage via Alibaba Cloud is pay-per-use (token/character-based) with potential free tier credits.
How accurate is the voice cloning in TTS-VC-Flash?
It achieves lower word error rates than ElevenLabs, MiniMax, and GPT-4o-Audio-Preview on multilingual tests, with strong timbre and prosody preservation.
Can TTS-VC-Flash be used offline?
No, it is primarily API-based through Alibaba Cloud; no local open-source weights specifically for the Flash variant are mentioned.
What makes TTS-VC-Flash fast?
The ‘Flash’ name indicates optimized low-latency cloning and synthesis, supporting real-time streaming via WebSocket for interactive applications.
Where can I try TTS-VC-Flash?
Test it directly on Hugging Face demo space (Qwen/Qwen-TTS-Clone-Demo) or ModelScope studio without setup.





