TencentTencent·🖼️ Image Generation

Hunyuan Image 3.0

private
Try on Venice.ai ↗
Quick reference
Hunyuan Image 3.0 — TLDR
  • 🏢 Tencent's native multimodal text-to-image model in the Hunyuan family.
  • 📏 80B total parameters with roughly 13B activated via Mixture-of-Experts.
  • 🧠 Unified autoregressive framework instead of the prevalent DiT design.
  • 👁️ World-knowledge reasoning elaborates sparse prompts into richer scenes.
  • 🆕 Instruct variant adds reasoning and image-to-image creative editing.
  • ⚡ Distilled checkpoint enables efficient roughly 8-step sampling.
  • 🎯 Photorealistic output with strong prompt adherence and fine detail.
  • 📚 Handles long, detailed prompts spanning complex multi-subject scenes.
💰 Pricing
$0.090
per image
📅 On Venice since
Mar 1, 2026
95 days ago
Provider

Tencent is a Chinese multinational technology conglomerate headquartered in Shenzhen. One of the highest-grossing multimedia companies in the world by revenue, Tencent has built a vast portfolio spanning gaming, social media, fintech, and cloud services. In…

Read full profile →
1 model on Venice
1 image
Added Mar 1, 2026

About this model

Hunyuan Image 3.0 is Tencent's text-to-image generator and the latest entry in the company's Hunyuan image line. Tencent describes it as a native multimodal model that unifies multimodal understanding and generation within a single autoregressive framework, with the image-generation module released openly. The architecture pairs a Mixture-of-Experts design with roughly 80 billion total parameters and about 13 billion activated during inference, using a Transfusion-style approach to bind text and image tokens.

The most concrete change from its same-family predecessor is structural. According to the technical report, version 3.0 moves beyond the prevalent DiT-based architectures to a unified autoregressive framework that models text and image modalities more directly, which the team links to more contextually rich generation. It also adds world-knowledge reasoning, automatically expanding sparse prompts with contextually appropriate detail.

On the feature side, the model emphasizes photorealistic imagery, strong prompt adherence, and fine-grained detail, alongside support for long, detailed prompts spanning multiple subjects and lighting parameters. The arXiv technical report details the data curation and post-training reinforcement learning behind these behaviors.

Tencent has since shipped additional checkpoints: an Instruct release adding reasoning-based prompt enhancement and image-to-image editing, plus a distilled variant tuned for efficient deployment with roughly 8-step sampling. Weights and code are available through Tencent's Hugging Face repository for self-hosting.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago