What is Pocket TTS?
Pocket TTS is a 100 million parameter text-to-speech model from Kyutai that runs in real-time on CPU with high-quality voice cloning from 5 seconds of audio.
When was Pocket TTS released?
Pocket TTS was officially released on January 13, 2026, with announcement, technical report, code, and weights made public.
Is Pocket TTS free to use?
Yes, it is completely free and open-source under MIT license with full model weights, code, and demos available on GitHub and Hugging Face.
Does Pocket TTS require a GPU?
No, it is specifically designed to run efficiently in real-time on standard CPU (e.g., Intel Core Ultra or Apple M3), no GPU needed.
How does voice cloning work in Pocket TTS?
Provide about 5 seconds of reference audio; the model encodes the voice (tone, emotion, accent) and generates speech in that cloned voice from any text input.
What languages does Pocket TTS support?
It is trained exclusively on English public datasets and performs best in English; no multilingual support is included in the base model.
Where can I try or download Pocket TTS?
Online demo on kyutai.org/tts; local installation via GitHub repo (kyutai-labs/pocket-tts) or Hugging Face (kyutai/pocket-tts); pip/uvx install available.
How does Pocket TTS compare to other TTS models?
It matches or exceeds larger GPU-based models in quality while being lightweight and CPU-only; outperforms competitors like F5-TTS and Kokoro in WER and CPU speed.




