Z.ai·🖼️ Image Generation

Z-Image Turbo

private

Try on Venice.ai ↗

Quick reference

Z-Image Turbo — TLDR

- 🆕 Distilled, speed-optimized text-to-image model from the Tongyi-MAI team
- 📏 6-billion-parameter single-stream diffusion transformer (S3-DiT)
- ⚡ Sub-second inference on H800 GPUs using only 8 NFEs
- 🔧 Designed to fit within 16GB VRAM on consumer cards
- 🌐 Bilingual English–Chinese text rendering
- 🎯 Generates images up to 1024x1024 resolution
- 🔒 Open weights under the Apache 2.0 license
- 📚 Roughly 919K downloads on Hugging Face

💰 Pricing

$0.010

per image

📅 On Venice since

Dec 3, 2025

227 days ago

Provider

Z.ai

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →

12 models on Venice

11 text · 1 image

Since Apr 1, 2024

Wikipedia ↗Official site ↗

See 11 other models from Z.ai →

About this model

Z-Image Turbo is the fast, distilled variant in the Z-Image image-generation family, published on Hugging Face by the Tongyi-MAI team with open weights under the Apache 2.0 license. It is a 6-billion-parameter text-to-image model built on a Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture, generating images up to 1024x1024 resolution.

The defining characteristic is efficiency. Where the full Z-Image foundation model runs many diffusion steps with classifier-free guidance, the Turbo edition is distilled to produce output in only 8 Number of Function Evaluations, with guidance scale set to zero. According to the maintainer's model card, this enables sub-second inference latency on enterprise-grade H800 GPUs while still fitting comfortably within 16GB of VRAM on consumer devices, trading some pose diversity and aesthetic richness for speed.

Compared with the non-distilled Z-Image, the Turbo variant is positioned for rapid iteration and real-time use rather than maximum quality and diversity, making it suited to local deployment and high-throughput pipelines. It supports bilingual English–Chinese text rendering and includes a prompt enhancer for improved instruction following.

The maintainer reports Elo-based human-preference evaluation via the Alibaba AI Arena. The weights are freely available on Hugging Face, and the Apache 2.0 license permits commercial use with attribution.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

Tongyi-MAI/Z-Image-Turbo · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

3 reference papers linked from the HuggingFace model card.

arXiv2511.22699Nov 2025

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer(2025)

Z-Image Team, Huanqia Cai, Sihan Cao et al.

The landscape of high-performance image generation models is currently dominated by proprietary systems, such as Nano Banana Pro and Seedream 4.0. Leading open-source alternatives, including Qwen-Image, Hunyuan-Image-3.0 and FLUX.2, are characterized by massive parameter counts…

arXiv2511.22677Nov 2025

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield(2025)

Dongyang Liu, Peng Gao, David Liu et al.

Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators. Among these, Distribution Matching Distillation (DMD) and its variants stand out for their impressive performance, which is widely attributed to their…

arXiv2511.13649Nov 2025

Distribution Matching Distillation Meets Reinforcement Learning(2025)

Dengyang Jiang, Dongyang Liu, Zanyi Wang et al.

Distribution Matching Distillation (DMD) facilitates efficient inference by distilling multi-step diffusion models into few-step variants. Concurrently, Reinforcement Learning (RL) has emerged as a vital tool for aligning generative models with human preferences. While both…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago