Qwen 3 235B A22B Thinking 2507
About this model
Qwen 3 235B A22B Thinking 2507 is Alibaba's reasoning-focused refresh of its flagship Mixture-of-Experts model. Per its model card, it activates roughly 22 billion of 235 billion total parameters per forward pass, organized across 94 layers with 128 experts (8 active per token), and natively handles up to 262,144 tokens of context. The catalog lists a 128K window and FP8 quantization for this deployment.
The key change versus the same-family predecessor is structural. The original Qwen3-235B-A22B uniquely switched between thinking and non-thinking modes within a single model. The July 2025 "2507" update split that hybrid design into two specialized siblings: a non-thinking Qwen 3 235B A22B Instruct 2507 and this Thinking variant, which always emits reasoning traces. Alibaba describes the Thinking model as having increased thinking length and enhanced long-context understanding, recommending it for highly complex reasoning.
According to the Qwen team's model card, this release brings improvements over the previous generation in reasoning, instruction-following, and agentic capabilities. The model supports tool calling through Qwen-Agent and targets technical work: multi-step mathematics, scientific analysis, algorithmic coding, and detailed document processing. Released under the permissive Apache 2.0 license, its open weights make it suitable for self-hosted research and enterprise pipelines.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
Primary reference paper for this model family, sourced from the HuggingFace model card.
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago