AlibabaAlibaba·💬 Text Generation

Qwen 3 235B A22B Instruct 2507

Function CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Qwen 3 235B A22B Instruct 2507 — TLDR
  • - 🧠 Mixture-of-experts: 235B total parameters, 22B active per token.
  • - ⚡ Non-thinking variant: direct answers, no reasoning traces.
  • - 📏 Model card cites 256K native context, extendable toward 1M tokens.
  • - 🔧 Strong tool-calling and agentic use via Qwen-Agent and MCP.
  • - 🌐 Multilingual coverage across many languages and dialects.
  • - 🔒 Apache 2.0 license, offered here in FP8 quantization.
  • - 🏢 Built by Alibaba's Qwen team.
  • - 🎯 Aimed at long documents, technical work, high-precision tasks.
💰 Pricing
$0.150 / $0.750
per 1M · input / output
📏 Context
128K tokens
📅 On Venice since
Apr 29, 2025
400 days ago
Provider

Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research,…

Read full profile →
46 models on Venice
17 text · 16 video · 5 image · 4 inpaint · 2 embedding · 2 tts
Since Jan 11, 2025

About this model

Qwen 3 235B A22B Instruct 2507 is a flagship Mixture-of-Experts model from Alibaba's Qwen team, released in 2025. It holds 235 billion total parameters but activates roughly 22 billion per forward pass, balancing capacity with inference cost. This is the instruction-tuned "non-thinking" line, meaning it returns direct responses without producing intermediate reasoning blocks, making outputs faster and more format-consistent than reasoning-chain variants. It carries an Apache 2.0 license and is offered here in FP8, which reduces memory footprint versus full precision.

The 2507 update is positioned as the refreshed version of the original Qwen3-235B-A22B non-thinking mode. Per Qwen's model card, it brings improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, multilingual understanding, and tool usage over that predecessor. The card also describes enhanced 256K long-context understanding, with configurations enabling ultra-long inputs toward one million tokens.

Its closest sibling is Qwen 3 235B A22B Thinking 2507, which shares the same architecture but generates explicit reasoning chains for complex problems, trading latency and token use for deeper deliberation. For vision and multimodal work, the family extends to Qwen3 VL 235B.

In practice, this Instruct variant suits high-throughput, latency-sensitive workloads — chatbots, API integrations, document analysis, and code generation — where consistent formatting matters more than visible step-by-step reasoning. Deployment is substantial, typically requiring multi-GPU tensor parallelism.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago