Gradium·🔊 Text to Speech

Gradium TTS

anonymized

Try on Venice.ai ↗

Quick reference

Gradium TTS — TLDR

- 🌐 Cloud text-to-speech served over a low-latency streaming WebSocket API.
- 💬 Built for real-time voice agents and conversational pipelines.
- 🔧 Integrates with Pipecat and LiveKit agent frameworks.
- 🎯 Selectable voice IDs plus runtime-configurable synthesis settings.
- ⚡ Streams base64-encoded PCM audio chunks at 24kHz.
- 🔒 Requires a Gradium API key for authentication.
- 🏢 Vendor-track hosted API with a separate on-device sibling.

💰 Pricing

$47.50

per 1M chars

📅 On Venice since

Jun 5, 2026

46 days ago

Provider

Gradium

Gradium is an AI organization focused on speech technology, developing systems that convert written text into natural-sounding audio. Its work centers on text-to-speech synthesis, a domain where quality of voice, clarity, and responsiveness matter as much as…

Read full profile →

1 model on Venice

1 tts

Added Jun 5, 2026

Official site ↗

About this model

Gradium TTS is a cloud-hosted text-to-speech service from Gradium, exposed through a low-latency streaming WebSocket API designed for real-time voice agents. It generates speech incrementally, returning base64-encoded PCM audio chunks at a 24kHz sample rate, which suits interruptible, conversational pipelines rather than batch file rendering. Authentication is handled with a Gradium API key, and callers select output by voice identifier while passing optional model and JSON configuration settings.

The service is integrated into common agent stacks: plugin support exists for both Pipecat and LiveKit, where Gradium can act as the TTS provider inside an agent session or as a standalone speech generator. Pipecat exposes runtime-configurable settings that can be updated mid-conversation, reflecting the model's focus on live interaction.

This page describes the first cataloged release in the Gradium TTS family, so there is no prior same-family version available here for a direct generational comparison.

Separately, Gradium describes its cloud API for use cases needing broad language and speaker coverage, while its on-device model, Phonon, targets offline, privacy-sensitive, and high-volume consumer deployments where cloud synthesis is not the right architecture.

Sources

TTS/.models.json · praveenchordia/tts at mainhuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 3d ago