Google·🔊 Text to Speech

Gemini 3.1 Flash TTS

anonymized

Try on Venice.ai ↗

Quick reference

Gemini 3.1 Flash TTS — TLDR

- 🗣️ Google's low-latency text-to-speech model for expressive speech.
- 🆕 Adds inline audio tags to steer delivery and emotion.
- 🎯 Steerable style prompts shape tone and pacing.
- 📏 Generates audio with up to 32K token output context.
- 🔒 Output watermarked with SynthID for AI-content identification.
- 🏢 Available in public preview on Google AI Studio and Vertex AI.
- 🧠 Based on the Gemini 3 Pro foundation model.

💰 Pricing

$188

per 1M chars

📅 On Venice since

Apr 20, 2026

91 days ago

Provider

Google

Google is an American multinational technology corporation and one of the world's most valuable brands. A subsidiary of parent company Alphabet Inc., Google operates across search, cloud computing, consumer electronics, and artificial intelligence. Its…

Read full profile →

30 models on Venice

11 video · 10 text · 3 image · 3 inpaint · 1 music · 1 embedding · 1 tts

Since Oct 15, 2024

Wikipedia ↗Official site ↗

See 29 other models from Google →

About this model

Gemini 3.1 Flash TTS is Google's text-to-speech model, released in April 2026 and offered in public preview through Google AI Studio and Vertex AI. It converts text into natural-sounding speech with low latency, letting developers steer delivery through descriptive style prompts and a set of inline audio tags that adjust tone, pacing, and emotion within an utterance. According to its DeepMind model card, the model is based on Gemini 3 Pro, produces audio with up to 32K token output, and watermarks all generated audio with SynthID.

Google positions the model for accessibility use cases such as screen readers and AAC devices, along with applications spanning audiobooks, gaming, and customer-facing voice systems. The expressive audio-tag system for narration control is among the headline features Google documents directly in its official materials.

Within Google's broader 2026 lineup, Gemini 3.1 Flash TTS handles the speech-synthesis modality, while text generation is served by siblings such as Gemini 3.5 Flash and Gemini 3.1 Pro Preview. It shares the Flash Audio designation with the real-time Gemini 3.1 Flash Live variant, covering the pre-generated portion of voice workflows.

Because no prior same-family TTS model appears in this catalog's sibling list, generational comparisons here are limited to the features Google describes in its official documentation: the inline audio-tag system and per-character style configuration.

Sources

Gemini 3.1 Flash Audio (Flash Live, TTS) - Model Card — Google DeepMinddeepmind.google ↗

Gemini 3.1 Flash TTS on Google Cloud | Google Cloud Blogcloud.google.com ↗

Gemini 3.1 Flash-Lite | Gemini Enterprise Agent Platform | Google Cloud Documentationdocs.cloud.google.com ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 5d ago