About this model
Qwen 3 TTS 0.6B is the compact member of Alibaba Cloud's open-source Qwen3-TTS family, released by the Qwen team for multilingual, controllable, streaming speech synthesis. It is built on the self-developed Qwen3-TTS-Tokenizer-12Hz, which performs efficient acoustic compression, and adopts a discrete multi-codebook language-model architecture for full-information end-to-end speech modeling. According to the official model card, it was trained on over 5 million hours of speech spanning 10 languages and supports 3-second voice cloning plus description-based control.
The headline feature is speed: the provider reports end-to-end synthesis latency as low as 97ms, suiting real-time interactive scenarios. The CustomVoice variant offers 9 premium timbres with fine-grained style control via natural-language instructions across languages including Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish and Italian.
Within the family, the 0.6B sits alongside its larger twin, Qwen 3 TTS 1.7B. Both share the same 12Hz-tokenizer foundation and capabilities, with the 0.6B being the smaller, lighter-weight variant. As the more compact option, it targets reduced compute and quicker inference.
A Qwen3-TTS Technical Report accompanies the release, and the models are distributed openly on Hugging Face and ModelScope. This makes the 0.6B a practical choice for low-latency, on-device or cost-sensitive speech applications.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago