What is the difference between TTS-VC and TTS-VD?
VC (Voice Cloning) copies an existing voice from an audio file. VD (Voice Design) creates a brand new voice from a text description (e.g., “Make a voice that sounds like a scary pirate”).
Can I save the voice I designed?
Yes, once you generate a voice you like via the API, you get a “Voice ID” that you can use in future calls to make that specific character speak new text consistently.
How detailed can the prompts be?
Very detailed. You can specify age ranges (e.g., “Child 5-12”), specific emotions (“calm,” “excited”), speaking rates (“fast,” “slow”), and even textural qualities like “raspy” or “sweet.”
Is TTS-VD-Flash expensive?
It follows the standard Alibaba Cloud pricing model, which is generally competitive (often billed per million characters). There is a free quota for new users to test the capabilities.








