HexgradHexgrad·🔊 Text to Speech·VS Pick

Kokoro Text to Speech

private
Try on Venice.ai ↗
Quick reference
Kokoro Text to Speech — TLDR
  • 🆕 Open-weight 82M-parameter TTS model from developer Hexgrad.
  • 📏 Compact size positions it for efficient inference, per catalog.
  • 🔒 Apache-2.0 licensed and freely deployable.
  • 🌐 Version 1.0 distributed as a community ONNX port.
  • 🔧 Available via Hugging Face weights and a hosted demo Space.
  • 💬 Converts written text into natural-sounding speech audio.
  • 📚 Popular open-source baseline for voice applications.
  • ⚡ Released March 2025 by an individual open-source developer.
💰 Pricing
$3.50
per 1M chars
📅 On Venice since
Mar 19, 2025
441 days ago
Provider

Hexgrad is an AI development group focused on speech synthesis. The organization is best known for creating Kokoro, a lightweight yet capable text-to-speech model that has attracted attention in the open-weight AI community for delivering high-quality voice…

Read full profile →
1 model on Venice
1 tts
Added Mar 19, 2025

About this model

Kokoro is an open-weight text-to-speech model created by Hexgrad, an individual open-source developer, and released in this v1.0 form in March 2025. With just 82 million parameters, it takes text in and produces audio out, positioning it as a compact alternative to far larger speech systems. Its small footprint is its defining feature, and the catalog describes it as offering efficient inference and natural-sounding voices.

The weights are distributed under the permissive Apache-2.0 license, making the model freely deployable across both production and hobby settings. A community ONNX port of the v1.0 release is published on Hugging Face, broadening the runtimes and platforms on which the model can operate.

The ecosystem around Kokoro includes Hexgrad's hosted demo Space on Hugging Face, where the model can be tried directly in the browser. As the catalog notes, it remains an efficient, low-cost open-source baseline for developers building voice interfaces, audiobooks, and accessibility applications.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

2 reference papers linked from the HuggingFace model card.

arXiv2306.07691Jun 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models(2023)

Yinghao Aaron Li, Cong Han, Vinay S. Raghavan et al.

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis. StyleTTS 2 differs from its predecessor by modeling styles as a latent random…

arXiv2203.02395Mar 2022

iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform(2022)

Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka et al.

In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is commonly applied as an intermediate representation, and the necessity for a mel-spectrogram vocoder is increasing. A mel-spectrogram vocoder must solve three inverse problems: recovery of the…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago