About this model
Gemini 3.1 Flash TTS is Google's text-to-speech model, released in April 2026 and offered in public preview through Google AI Studio and Vertex AI. It converts text into natural-sounding speech with low latency, letting developers steer delivery through descriptive style prompts and a set of inline audio tags that adjust tone, pacing, and emotion within an utterance. According to its DeepMind model card, the model is based on Gemini 3 Pro, produces audio with up to 32K token output, and watermarks all generated audio with SynthID.
Google positions the model for accessibility use cases such as screen readers and AAC devices, along with applications spanning audiobooks, gaming, and customer-facing voice systems. The expressive audio-tag system for narration control is among the headline features Google documents directly in its official materials.
Within Google's broader 2026 lineup, Gemini 3.1 Flash TTS handles the speech-synthesis modality, while text generation is served by siblings such as Gemini 3.5 Flash and Gemini 3.1 Pro Preview. It shares the Flash Audio designation with the real-time Gemini 3.1 Flash Live variant, covering the pre-generated portion of voice workflows.
Because no prior same-family TTS model appears in this catalog's sibling list, generational comparisons here are limited to the features Google describes in its official documentation: the inline audio-tag system and per-character style configuration.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago