Tencent·🖼️ Image Generation

Hunyuan Image 3.0

private

Try on Venice.ai ↗

Quick reference

Hunyuan Image 3.0 — TLDR

🏢 Tencent's native multimodal text-to-image model in the Hunyuan family.
📏 80B total parameters with roughly 13B activated via Mixture-of-Experts.
🧠 Unified autoregressive framework instead of the prevalent DiT design.
👁️ World-knowledge reasoning elaborates sparse prompts into richer scenes.
🆕 Instruct variant adds reasoning and image-to-image creative editing.
⚡ Distilled checkpoint enables efficient roughly 8-step sampling.
🎯 Photorealistic output with strong prompt adherence and fine detail.
📚 Handles long, detailed prompts spanning complex multi-subject scenes.

💰 Pricing

$0.090

per image

📅 On Venice since

Mar 1, 2026

140 days ago

Provider

Tencent

Tencent is a Chinese multinational technology conglomerate headquartered in Shenzhen. One of the highest-grossing multimedia companies in the world by revenue, Tencent has built a vast portfolio spanning gaming, social media, fintech, and cloud services. In…

Read full profile →

1 model on Venice

1 image

Added Mar 1, 2026

Wikipedia ↗Official site ↗

About this model

Hunyuan Image 3.0 is Tencent's text-to-image generator and the latest entry in the company's Hunyuan image line. Tencent describes it as a native multimodal model that unifies multimodal understanding and generation within a single autoregressive framework, with the image-generation module released openly. The architecture pairs a Mixture-of-Experts design with roughly 80 billion total parameters and about 13 billion activated during inference, using a Transfusion-style approach to bind text and image tokens.

The most concrete change from its same-family predecessor is structural. According to the technical report, version 3.0 moves beyond the prevalent DiT-based architectures to a unified autoregressive framework that models text and image modalities more directly, which the team links to more contextually rich generation. It also adds world-knowledge reasoning, automatically expanding sparse prompts with contextually appropriate detail.

On the feature side, the model emphasizes photorealistic imagery, strong prompt adherence, and fine-grained detail, alongside support for long, detailed prompts spanning multiple subjects and lighting parameters. The arXiv technical report details the data curation and post-training reinforcement learning behind these behaviors.

Tencent has since shipped additional checkpoints: an Instruct release adding reasoning-based prompt enhancement and image-to-image editing, plus a distilled variant tuned for efficient deployment with roughly 8-step sampling. Weights and code are available through Tencent's Hugging Face repository for self-hosting.

Sources

HunyuanImage 3.0 Technical Reportarxiv.org ↗

tencent/HunyuanImage-3.0 · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 4d ago